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NOVEL BRAIN EXPRESSED GENE AND PROTEIN ASSOCIATED WITH 

BIPOLAR DISORDER 



5 

FIELD OF THE INVENTION : 

The invention is broadly concerned with the determination of genetic factors associated 
with psychiatric health. More particularly, the present invention is directed to a human 
10 gene which is linked to a mood disorder or related disorder in affected individuals and 
their families. Specifically, the present invention is directed to a gene located on the 
eighteenth chromosome that is expressed in brain tissue and may be used as a 
diagnostic marker for bipolar disorder. 

15 

BACKGROUND OF THE INVENTION : 
Pharmacogenetics background: 

20 Every individual is a product of the interaction of their genes and the environment. 

Pharmacogenetics is the study of how genetic differences influence the variability in 
patients responses to drugs. Through the use of pharmacogenetics, we will soon be 
able to profile variations between individualsDNA to predict responses to a particular 
medicine. Target validation that will predict a well-tolerated and effective medicine for 

25 a clinical indication in humans is a widely perceived problem; but the real challenge is 
target selection. A limited number of molecular target families have been identified, 
including receptors and enzymes, for which high throughput screening is currently 
possible. A good target is one against which many compounds can be screened rapidly 
to identify active molecules (hits). These hits can be developed into optimized 

30 molecules (leads), which have the properties of well-tolerated and effective medicines. 
Selection of targets that can be validated for a disease or clinical symptom is a major 
problem faced by the pharmaceutical industry. The best- validated targets are those that 
have already produced well-tolerated and effective medicines in humans (precedent 
targets). Many targets are chosen on the basis of scientific hypotheses and do not lead 

35 to effective medicines because the initial hypotheses are often subsequently disproved. 
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Two broad strategies are being used to identify genes and express their protein products 
for use as high-throughput targets. These approaches of genomics and genetics share 
technologies but represent distinct scientific tactics and investments. Discovery 
genomics uses the increasing number of databases of DNA sequence information to 
5 identify genes and families of genes for tractable or scrollable targets that are not 
known to be genetically related to disease. 

The advantage of information on disease-susceptibility genes derived from patients is 
that, by definition, these genes are relevant to the patients 'genetic contributions to the 

10 disease. However, most susceptibility genes will not be tractable targets or amenable 
to high-throughput screening methods to identify active compounds. 
The differential metabolism related to the relevant gene variants can be studied in 
focused functional genomic and proteomic technologies to discover mechanisms of 
disease development or progression. 

15 Critical enzymes of receptors associated with the altered metabolism can be used as 
targets. Gene-to-function-to-target strategies that focus on the role of the specific 
susceptibility gene variants on appropriate cellular metabolism become important. 
Data mining of sequences from the Human Genome Project and similar programmes 
with powerful bioinformatic tools has made it possible to identify gene families by 

20 locating domains that possess similar sequences. Genes identified by these genomic 
strategies generally require some sort of functional validation or relationship to a 
disease process. Technologies such as differential gene expression, transgenic animal 
models, proteomics, in situ hybridization and immunohistochemistry are used to imply 
relationships between a gene and a disease. 

25 

The major distinction between the genomic and genetic approaches is target selection, 
which genetically defined genes and variant-specific targets already known to be 
involved in the disease process. The current vogue of discovery genomics for 
nonspecific, wholesale gene identification, with each gene in search of a relationship to 
30 a disease, creates great opportunities for development of medicines. 



It is also critical to realize that the core problem for drug development is poor target 
selection. The screening use of unproven technologies to imply disease-related 
validation, and the huge investment necessary to progress each selected gene to proof 
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of a concept in humans, is based on an unproven and cavalier use of the word 
Validation'. Each failure is very expensive in lost time and money. For example, 
differential gene expression (DGE) and proeomics are screening technologies that are 
widely used for target validation. They detect different levels and/or patterns of gene 
5 and protein expression in tissues, which may be used to imply a relationship to a 
disease affecting that tissue. 

Mood Disorder Background: 

Mood disorders or related disorders include but are not limited to the following 

10 disorders as defined in the Diagnostic and statistical Manual of Mental Disorders, 
version 4 (DSM-IV) taxonomy DSM-IV codes in parenthesis): mood disorders 
(296.XX,300.4,31 1,301.13,295.70) , schizophrenia and v related disorders 
(295.XX,297. 1,298.8,297.3,298.9), anxiety disorders (300.XX,309.8 1,308.3), 
adjustment disorders (309.XX) and personality disorders (codes 301. XX) . 

15 The present invention is particularly directed to genetic factors associated with a family 
of mood disorders known as Bipolar (BP) spectrum disorders. Bipolar disorder (BP) is 
a severe psychiatric condition that is characterized by disturbances in mood, ranging 
from an extreme state of elation (mania) to a severe state of dysphoria (depression). 
Two types of bipolar illness have been described: type I BP illness (BPI) is 

20 characterized by major depressive episodes alternated with phases of mania, and type II 
BP illness (BPII) , characterized by major depressive episodes alternating with phases 
of hypomania. Relatives of BP probands have an increased risk for BP, unipolar 
disorder (patients only experiencing depressive episodes; UP), cyclothymia (minor 
depression and hypomania episodes; cy) as well as for schizoaffective disorders of the 

25 manic (S Am) and depressive (SAd) type. Based on these observations BP, cY, UP and 
SA are classified as BP spectrum disorders. 

The involvement of genetic factors in the etiology of BP spectrum disorders was 
suggested by family, twin and adoption studies (Tsuang and Faraone (1990), the 
Genetics of Mood Disorders, Baltimore, The John Hopkins University Press) However, 
30 the exact pattern of transmission is unknown. In some studies, complex segregation 
analysis supports the existence of a single major locus for BP (Spence et al. (1995), Am 
J.Med. Genet (Neuropsych. Genet.) QQ pp 370-376). Other researchers propose a 
liability-threshold-model, in which the liability to develop the disorder results from the 
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additive combination of multiple genetic and environmental effects (McGuffin et al. 
(1994) , Affective Disorders; Seminars in Psychiatric Genetics Gaskell, London pp 
110-127). 

Due to the complex mode of inheritance, parametric and non-parametric linkage 
5 strategies are applied in families in which BP disorder appears to be transmitted in a 
Mendelian fashion. Early linkage findings on chromosomes llplS (Egeland et al. 
(1987) , Nature - pp 783-787) and Xq27-q28 (Mendlewicz 'et al. (1987, the Lancet 1 pp 
1230 -1232; Baron et al. (1987) Nature 12& pp 289-292) have been controversial and 
could initially not be replicated (Kelsoe et al. (1989) Nature ~ pp 238-243; Baron et al. 

10 (1993) Nature Genet - pp 49-55) .with the development of a human genetic map 
saturated with highly polymorphic markers and the continuous development of data 
analysis techniques, numerous new linkage searches were started. In several studies, 
evidence or suggestive evidence for linkage to particular regions on chromosomes 4, 
12, 18, 21 and X was found (Black wood et al. (1996) Nature Genetics - pp 427-430, 

15 Craddock et al. (1994) Brit J. psychiatry ~ pp355-358, Berrettini et al. (1994), Proc 
Natl Acad Sci USA - pp 5918-5921, Straub et al. (1994) Nature Genetics ~ pp 291-296 
and Pekkarinen et al. (1995) Genome Research 2 pp 105-115). In order to test the 
validity of the reported linkage results, these findings have to be replicated in other, 
independent studies. 

20 Recently, linkage of bipolar disorder to the pericentromeric region on chromosome 18 
was reported (Berrettini et al. 1994). Also a ring chromosome 18 with break-points 
and deleted regions at 18pter-pll and 18q23-qter was reported in three unrelated 
patients with BP illness or relates syndromes (Craddock et al. 1994). The chromosome 
18p linkage was replicated by stine et al. (1995) Am J. Hum Genet 22 pp 1384-1394, 

25 who also reported suggestive evidence for a locus on 18q21.2-q21.32 in the same 
study. 

Interestingly, Stine et al. observed a parent-of-origin effect: the evidence of linkage was 
the strongest in the paternal pedigrees, in which the proband's father or one of the 
proband's father's sibs is affected. Several studies described anticipation in families 
30 transmitting BP disorder(Mdnnis et al 1993, Nylander et al 1994) suggesting the 
involvement of trinucleotide repeat expansions (TREs), considering a number of 
diseases caused by an expansion of a CAG/CTG, a CCG/CGG or a GAAyTTC repeat 
show anticipation (reviewed by Margolis et al.(Margolis et al 1999)). Previous efforts 
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to find potentially expanded repeats have primarily focused on CAG/CTG repeats 
although the search for CCG/CGG repeats is increasing(Kleiderlein et al 1998, Mangel 
et al 1998, Eichhammer et al 1998, Kaushik et al 2000). Previously, we reported on a 
new method for the region specific isolation of triplet repeats: triplet repeat YAC 

5 fragmentation(Del Favero et al 1999). This proved to be a valid method for the 
isolation of CAG/CTG repeats and using this method, we exlcuded the involvement of 
CAG/CTG repeats from within 18q21.33-q23 in bipolar disorder(Goossens et al 2000). 
The present invention adapted the method for the region specific isolation of 
CCG/CGG repeats and applied it to the chromosome 18q21.33-q23 BP candidate 

10 region. 

SUMMARY OF THE INVENTION : 

The present invention is directed to a novel gene and protein encoded by that gene. 

The novel gene is located at an 8.9 cM chromosome region located between D18S68 
15 and D18S979 at 18q21.33-q23 A physical map was constructed using yeast artificial 
chromosomes (YACs)(Verheyen et al 1999). 

The previously described method was adapted for the region specific isolation of 
CCG/CGG repeats and applied to the chromosome 18q21.33-q23 BP candidate region. 
Three potential CpG islands were isolated, one of which is located 1.5 kb upstream of 
20 a predicted exon of 3639 bp. Further analysis showed this was part of a novel CpG- 
associated, brain-expressed gene, herein called NCAG1 (Novel CpG Associated Gene 
1). Mutation analysis of this positional and functional candidate identified two single 
nucleotide polymorphisms, which may be useful as a diagnostic marker for BP 
phenotype. 

25 

BRIEF DESCRIPTION OF THE DRAWING 

Figure 1. List of all human ESTs found by BLASTN alignment searches of dbEST. 
ESTs are named with their Genbank Acc Nos. I.M.A.G.E. Consortium [LLNL] cDNA 
30 Clones(Lennon et al 1996) are named with their RZPD clone ID. 

Figure 2: Minimal YAC tiling path of the 18q21.33-q23 BP candidate 
region(Verheyen et al 1999). The YACs are represented by solid lines, the CCG/CGG 
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fragmentation products by dotted lines. YAC sizes, between brackets, are estimated by 
PFGE analysis. Solid circles indicate positive STS/STR hits. Shaded boxes highlight 
the CCG/CGG repeat and the three CpG islands isolated by YAC fragmentation. 

5 Figure 3: Feature map of NCAG1. a) Predicted Features by bioinformatics. They 
encompass the CpG island as predicted by LCP(Huang 1994) and CPG(Larsen et al 
1992), the ORF or exon as predicted by Grail(Uberbacher & Mural 1991) and 
Genscan(Burge & Karlin 1997), the transcription start site (TSS) as predicted by 
Proscan(Prestridge 1995)and the relevant polyadenylation signals as predicted by 

10 PolyAH(Salamov & Solovyev 1997). The numbers below the features indicate the 
scores as returned by Proscan and PolyAH. b) Alignment of EST hits. ESTs are named 
with their Genbank Acc Nos. c) Alignment of cDNA clones. I.M.A.G.E. Consortium 
[LLNL] cDNA Clones(Lennon et al 1996) are named with their RZPD clone ID. d) 
RT-PCR products. The grey bars represent the RT-PCR product, the thin black lines 

15 represent the sequences obtained on the nested PCRs. 



DETAILED DESCRIPTION OF THE INVENTION: 

The present invention is directed to a novel gene located at the 18q chromosomal 
20 candidate region of chromosome 18. More specifically, the gene is located at an 8.9 
cM region located between D18S68 and D18S979 at 18q21.33-q23. 
The gene is located at a chromosomal region associated with mood disorders such as 
bipolar spectrum disorders and may therefore be useful as a diagnostic marker for 
bipolar spectrum disorders. The region in question when removed from the totality of 
25 the human genome may also be used to locate, isolate and sequence other genes which 
influences psychiatric health and mood. 

Isolation and identification of Identification of novel gene: 

Standard procedures well-known to one skilled in the art were applied to the identified 
30 YAC clones and, where applicable, to the DNA from an individual afflicted with a 
mood disorder as defined herein, in the process of identifying and characterizing the 
relevant gene. For example, the inventors are able to make use of the previously 
identified apparent association between trinucleotide repeat expansions (TRE) within 
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the human genome and the phenomenon of anticipation in mood disorders (Lindblad et 
al. (1995), Neurobiology of Disease 2. pp 55-62 and ODonovan et al. (1995), Nature 
Genetics 1Q pp 380-381) to screen for TRE's in the selected YAC clones in order to 
identify candidate genes in the region of interest on human chromosomel8. A variety 
5 of other known procedures can also be applied to the said YAC clones to identify the 
candidate gene as discussed below. 

Accordingly, in a first aspect the present invention comprises the use of an 8.9 cM 
region of human chromosome 18q disposed between polymorphic markers D18S68 and 

10 D18S979 or a fragment thereof for identifying at least one human gene, including 
mutated and polymorphic variants thereof, which is associated with mood disorders or 
related disorders as defined above. As will be described below, the present inventors 
have identified this candidate region of chromosome 18q for such a gene, by analysis of 
co-segregation of bipolar disease in family MAD31 with 12 STR polymorphic markers 

15 previously located between D18S51 and D18S61 and subsequent LaD score analysis. 
Particular YACs covering the candidate region which may be used in accordance with 
the present invention are 961.h-9, 942-C.3, 766-M2, 731-c- 7, 907.e.l, 752-g-8 and 
717-d-3, preferred ones being 961h-9, 766.f.l2 and 907-e.l since these have the 
minimum tiling path across the candidate region, suitable YAC clones for use are those 

20 having an artificial chromosome spanning the refined candidate region between 
D18S68andD18S979. 

There are a number of methods which can be applied to the candidate regions of 
chromosome 18q as defined above, whether or not present in a YAC, to identify a 
candidate gene or genes associated with mood disorders or related disorders. For 
25 example, as aforesaid, there is an apparent association between the extent of 
trinucleotide repeat expansions (TRE) in the human genome and the presence of mood 
disorders. 

Accordingly, in a third aspect the present invention comprises a method of identifying 
at least one human gene, including mutated and polymorphic variants thereof, which is 
30 associated with a mood disorder or related disorder as defined herein which comprises 
detecting nucleotide triplet repeats in the region of human chromosome 18q disposed 
between polymorphic markers D18S68 and D18S979. 
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An alternative method of identifying said gene or genes comprises fragmenting a YAC 
clone comprising a portion of human chromosome 18q disposed between polymorphic 
markers D18S60 and D18S61, for example one or more of the seven aforementioned 
YAC clones, and detecting any nucleotide triplet repeats in said fragments, in particular 
5 repeats of CAG or CTG. Nucleic acid probes comprising at least 5 and preferably at 
least 10 CTG and/or CAG triplet repeats are a suitable means of detection when 
appropriately labelled. Trinucleotide repeats may also be determined using the known 
RED (repeat expansion detection) system (Shalling et al. (1993) , Nature Genetics - pp 
135-139). 

10 In a fourth embodiment the invention comprises a method of identifying at least one 
gene, including mutated and polymorphic variants thereof, which is 
associated with a mood disorder or related disorder and which is present in a YAC 
clone spanning the region of human chromosome 18q between polymorphic markers 
D18S60 and D18S61, the method comprising the step of detecting the expression 

15 product of a gene incorporating nucleotide triplet repeats by use of an antibody capable 
of recognizing a protein with anamino acid sequence comprising a string of at least 8, 
but preferably at least 12, continuous glutamine residues. Such a method may be 
implemented by sub-cloning YAC DNA, for example from the seven aforementioned 
YAC clones, into a human DNA expression library. A preferred means of detecting the 

20 relevant expression product is by use of a monoclonal antibody, in particular mABlC2, 
the preparation and properties of which are described in International Patent. 
Application Publication No WO 97/17445. 

Further embodiments of the present invention relate to methods of identifying the 
relevant gene orgenes which involve the sub-cloning of YAC DNA as defined above 

25 into vectors such as BAC (bacterial artificial chromosome) or PAC (PI or phage 
artificial chromosome) or cosmid vectors such as exon-trap cosmid vectors. The 
starting point for such methods is the construction of a contig map of the region of 
human chromosome 18q between polymorphic markers D18S60 and D18S61. To this 
end the present inventors have sequenced the end regions of the fragment of human 

30 DNA in each of the seven aforementioned YAC clones and these sequences are 
disclosed herein. Following sub-cloning of YAC DNA into other vectors as described 
above, probes comprising these end sequences or portions thereof, in particular those 
sequences shown in Figures 1 to 11 herein, together with any known sequenced tagged 
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site (STS) in this region, as described in the YAC clone contig shown herein, as can be 
used to detect overlaps between said sub-clones and a contig map can be constructed. 
Also the known sequences in the current YAC contig can be used for the generation of 
contig map sub-clones. 

5 One route by which a gene or genes which is associated with a mood disorder or 
associated disorder can be identified is by use of the known technique of exon trapping. 
This is an artificial RNA splicing assay, most often making use in current protocols of a 
specialized exon-trap cosmid vector. The vector contains an artificial mini-gene 
consisting of a segment of the SV40 genome containing an origin of replication and a 

10 powerful promoter sequence, two splicing-competentexons separated by an intron 
which contains a multiple cloning site and an SV40 polyadenylation site. 
The YAC DNA is sub-cloned in the exon-trap vector and the recombinant DNA is 
transfected into a strain of mammalian cells. Transcription from the SV40 promoter 
results in an RNA transcript which normally splices to include the two exons of the 

15 minigene. If the cloned DNA itself contains a functional exon, it can be spliced to the 
exons present in the vector's minigene. Using reverse transcriptase a cDNA copy can be 
made and using specific PCR primers, splicing events involving exons of the insert 
DNA can be identified. Such a procedure can identify coding regions in the YAC DNA 
which can be compared to the equivalent regions of DNA from a person afflicted with 

20 a mood disorder or related disorder to identify the relevant gene. 

Accordingly, in a fifth aspect the invention comprises a method of identifying at least 
one human gene, including mutated variants and polymorphisms thereof, which is 
associated with a mood disorder or related disorder which comprises the steps of: 

(1) transfecting mammalian cells with exon trap cosmid vectors prepared and mapped 
25 as described above; 

(2) culturing said mammalian cells in an appropriate medium; 

(3) isolating RNA transcripts expressed from the SV40 promoter; 

(4) preparing cDNA from said RNA transcripts; 

(5) identifying splicing events involving exons of the DNA sub-cloned into said exon 
30 trap cosmid vectors to elucidate positions of coding regions in said sub-cloned DNA; 

(6) detecting differences between said coding regions and equivalent regions in the 
DNA of an individual afflicted with said mood disorder or related disorder; and 



WO 02/101044 



10 



PCT/EP02/06316 



(7) identifying said gene or mutated ^polymorphic variant thereof which is associated 
with said mood disorder or related disorders. 

As an alternative to exon trapping the YAC DNA may be sub-cloned into BAC, PAC, 
cosmid or other vectors and a contig map constructed as described above. There are a 
5 variety of known methods available by which the position of relevant genes on the sub- 
cloned DNA can be established as follows: 

(a) cDNA selection or capture (also called direct selection and cDNA selection) : this 
method involves the forming of genomic DNA/cDNA heteroduplexes by hybridizing a 
cloned DNA (e.g. an insert of a YAC DNA), to a complex mixture of cDNAs, such as 

10 the inserts of all cDNA clones from a specific (e.g. brain) cDNA library. Related 
sequences will hybridize and can be enriched in subsequent steps using biotin- 
streptavidine capturing and PCR (or related techniques); 

(b) hybridization to mRNA/cDNA: a genomic clone (e.g. the insert of a specific 
cosmid) can be hybridized to a Northern blot of mRNA from a panel of culture cell 

15 lines or against appropriate (e.g. brain) cDNA libraries. A positive signal can indicate 
the presence of a gene within the cloned fragment; 

(c) CpG island identification: CpG or HTF islands are short (about 1 kb) 
hypomethylated GC-rich (> 60%) sequences which are often found at the 5' ends of 
genes. CpG islands often have restriction sites for several rare-cutter restriction 

20 enzymes. Clustering of rare-cutter restriction sites is indicative of a CpG island and 
therefore of a possible gene. CpG islands can be detected by hybridization of a DNA 
clone to Southern blots of genomic DNA digested with rare-cutting enzymes, or by 
island-rescue PCR (isolation of CpGislands from YACs by amplifying sequences 
between islands and neighbouring Alu-repeats) ; 

25 (d) zoo-blotting: hybridizing a DNA clone (e.g. the insert of a specific cosmid) at 
reduced stringency against a Southern blot of genomic DNA samples from a variety of 
animal species. Detection of hybridization signals can suggest conserved sequences, 
indicating a possible gene. Accordingly, in a sixth aspect the invention comprises a 
method of identifying at least one human gene including mutated and polymorphic 

30 variants thereof which is associated with a mood disorder or related disorder which 
comprises the steps of: 

(1) sub-cloning the YAC DNA as described above into a cosmid, BAC, PAC or other 
vector; 



WO 02/101044 



11 



PCT/EP02/06316 



(2) using the nucleotide sequences shown in any one of Figures 1 to 11 or any other 
sequenced tagged site (STS) in this region as in the YAC clone contig described herein, 
or part thereof consisting of not less than 14 contiguous bases or the complement 
thereof, to detect overlaps amongst the sub-clones and construct a map thereof; 

5 (3) identifying the position of genes within the sub-cloned DNA by one or more of 
CpG island identification, zoo-blotting, hybridization of the sub-cloned DNA to a 
cDNA library or a Northern blot of mRNA from a panel of culture cell lines; 
(4) detecting differences between said genes and equivalent region of the DNA of an 
individual afflicted with a mood disorder or related disorder; and 

10 (5) identifying said gene which is associated with said mood disorders or related 
disorders. 

If the cloned YAC DNA is sequenced, computer analysis can be used to establish the 
presence of relevant genes. Techniques such as homology searching and exon 
prediction may be applied. 

15 Once a candidate gene has been isolated in accordance with the methods of the 
invention more detailed comparisons may be made between the gene from a normal 
individual and one afflicted with a mood disorder such as a bipolar spectrum disorder. 
For example, there are two methods, described as "mutation testing", by which a 
mutation or polymorphism in a DNA sequence can be identified. In the first the DNA 

20 sample may be tested for the presence or absence of one specific mutation but this 
requires knowledge of what the mutation might be. In the second a sample of DNA is 
screened for any deviation from a standard (normal) DNA. This latter method is more 
useful for identifying candidate genes where a mutation is not identified in advance. In 
addition the following techniques may be further applied to a gene identified by the 

25 above-described methods to identify differences between genes from normal or healthy 
individuals and those afflicted with a mood disorder or related disorder: 

(a) Southern blotting techniques: a clone is hybridized to nylon membranes containing 
genomic DNA digested with different restriction enzymes of patients and 
healthyindividuals. Large differences between patients and healthy individuals can be 

30 visualized using a radioactive labelling protocol; 

(b) heteroduplex mobility in polyacrylamide gels: this technique is based on the fact 
that the mobility of heteroduplexes in non-denaturing polyacrylamide gels is less than 
the mobility of homoduplexes. It is most effective for fragments under 200 bp; 
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(c) single-strand conformational polymorphism analysis (SSCP or SSCA) : single 
stranded DNA folds up to form complex structures that are stabilized by weak 
intramolecular bonds. 

The electrophoretic mobilities of these structures on non-denaturing polyacrylamide 
5 gels depends on their chain lengths and on their conformation; 

(d) chemical cleavage of mismatches (CCM) : a radiolabeled probe is hybridized to the 
test DNA, and mismatches detected by a series of chemical reactions that cleave one 
strand of the DNA at the site of the mismatch. This is a very sensitive method and can 
be applied to kilobase-length samples; 

10 (e) enzymatic cleavage of mismatches: the assay is similar to CCM, but the cleavage is 
performed by certain bacteriophage enzymes. 

(f) denaturing gradient gel electrophoresis: in this technique, DNA duplexes are forced 
to migrate through an electrophoretic gel in which there is a gradient of increasing 
amounts of a denaturant (chemical or temperature). Migration continues until the DNA 

15 duplexes reach a position on the gel wherein the strands melt and separate, after which 
the denatured DNA does not migrate much further. A single base pair difference 
between a normal and a mutant DNA duplex is sufficient to cause them to migrate to 
different positions in the gel; 

(g) direct DNA sequencing. 

20 It will be appreciated that with respect to the methods described herein, in the step of 
detecting differences between coding regions from the YAC and the DNA of an 
individual afflicted with a mood disorder or related disorder, the said individual may 
be anybody with the disorder and not necessary a member of family MAD31. 

25 In accordance with further aspects the present invention provides an isolated human 
gene and variants thereof associated with a mood disorder or related disorder and 
which is obtainable by any of the above described methods, an isolated human protein 
encoded by said gene and a cDNA encoding said protein. 

30 Once a gene has been identified a number of methods are available to determine the 
function of the encoded protein. These methods are described by Eisenberg et al 
(Nature vol. 15, June 2000) and is herein incorporated by reference. One method 
involves a computational method that reveals functional linkages from genome 
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sequences and is called the gene neighbor metho. If in several genomes the genes that 
encode two proteins are neighbors on the chromosome, the proteins tend to be 
functionally linked. This method can be powerful in uncovering functional linkages in 
prokaryotes, where operons are common, but also shows promise for analysing 
5 interacting proteins in eukaryotes. 

Examples: 
Example 1 

10 

A :Triplet repeat isolation 

CCG/CGG YAC fragmentation vectors were constructed by cloning blunted 
(CCG)jo/(CGG)to adapters into the blunted SphI site of the previously described pDVl 
basic vector(Del-Favero et al 1999). Sequencing determined that fragmentation vectors 
15 pDVCCG and pDVCGG have the adapter sequence in a 5*-(CCG)io-3* and a 5*- 
(CGG)io-3* orientation respectively. 

Using these vectors, CCG/CGG repeats and flanking sequences were isolated by YAC 
fragmentation as described(Del-Favero et al 1999). 

20 B: Characterisation of Structure of the NCAG1 £ene. 

LM.A.G.E. Consortium [LLNL] cDNA Clones(Lennon et al 1996) 
MAGp998A136826Q2, IMAGp998A154307Q2, IMAGp998B194346Q2, 
MAGp998D126826Q2, MAGp998D193628Q2, IMAGp998F131866Q2, 
IMAGp998H201815Q2, IMAGp998K235214Q2, MAGp998L153967Q2 and 

25 MAGp998N06839Q2 were ordered at RZPD Deutsches Ressourcenzentrum fur 
Genomforschung GmbH (Heubnerweg 6, 14059 Berlin-Charlottenburg, Germany). 
Cultures starting from single colonies were grown and plasmids were prepared by the 
Wizard Plus SV Minipreps DNA Purification System (Promega, Madison, WI). DNA 
sequencing was performed with the dideoxynucleotide sequencing method using a 

30 DNA sequencing kit (Perkin-Elmer, Foster, CA) and analysed by an ABI PRISM 377 
DNA Sequencer (Perkin-Elmer, Foster, CA) or an ABI PRISM 3700 DNA Analyser 
(Perkin-Elmer, Foster, CA). 

For the RT-PCR reactions, mRNA from SHSY-5Y cells was prepared using the 
fiMACS mRNA Isolation Kit (Miltenyi Biotec, Bergisch Gladbach, Germany). After 
35 DNAsel treatment (Promega, Madison, WI), the RT reaction was primed with 
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oligo(dT) primers and performed with Superscript Preamplification System for First 
Strand cDNA synthesis (GibcoBRL, N.V. Life Technologies, Merelbeke, Belgium). Fs- 
cDNA was used in long-range PCR reactions with TaKaRa LA Taq (Takara Shuzo Co., 
Otsu, Shiga, Japan). PCR products were reamplified with nested primers and 
5 sequenced as described above. 

C: Characterisation of the expression pattern of the NCAG1 gene. 

Genepool cDNA (Invitrogen, Carlsbad, CA) from brain, fetal brain, placenta, liver, 

testis and lung was used as a cDNA mapping panel. The Human Brain Multiple Tissue 
10 Northern (MTN) Blot IV (Clontech, Palo Alto, CA) was used for radioactive 
hybridisation in accompanying ExpressHyb solution according to the instructions of the 
manufacturer. A zooblot was prepared by digesting 10 /xg genomic DNA to completion 
with Hindin, running it on a TAE 1% agarose gel and performing a Southern blot. A 
PCR product containing the ORF of the NCAG1 gene was radioactively labelled and 
15 hybridised at 65 °C. 

D: Mutation analysis of the NCAG1 gene. 

Overlapping PCR products of approximately 600 bp were generated and sequenced as 
described above. Both identified polymorphisms were detected by digesting the PCR 
20 product with Hinfl and electrophoresing the fragments on precast ExcelGel gels on a 
Multiphor II electrophoresis system (Amersham Pharmacia Biotech AB, Uppsala, 
Sweden) 

25 E: CCG/CGG YAC fragmentation 

CCG/CGG YAC fragmentation was applied to YACs 961h9, 766fl2 and 

907el(Goossens et al 2000). Size determination by Pulsed Field Gel Electrophoresis 
(PFGE) and Southern blot hybridisation resulted in 33 sets of equally sized fragmented 
YAC clones. Sequencing of 112 fragmented YAC ends identified seven (out of 33) sets 

30 of fragmented YACs with identical end sequences resulting from a specific 
homologous recombination. One set (CCG7) was the result of fragmentation in the 
(CGG) 6 repeat in the 5' UTR of the CAP2 gene (GenBank acc. No L40377). A second 
set (CCG6) contained a (CCG) 2 repeat and a third (CCG4) an imperfect CCCCG 
repeat. The triplet repeat in the 5' UTR of the CAP2 gene was already shown not to be 

35 associated with BP disorder(Goossens et al 2000). The size of CCG4 was analyzed in 
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12 BP and 12 UP patients, but only one allele was detected. The size of CCG6 was not 
analyzed since it was to small to be polymorphic. 

In depth analysis showed that three (CCG3, GenBank acc No CCG4, GenBank acc 
No... and CCG6, GenBank acc No ...) of the seven sequences had high CG content 

5 (70-80 %) and high CpG content (15-20 CpGs in 200 bp) but no additional CCG/CGG 
repeats were found. Primer pairs for these potential CpG islands were used to 
determine their position on the YAC contig (Figurel). BLASTN analysis(Altschul et al 
1990) resulted for both CCG4 and CCG6 in hits with sequences of RPCM1 BACs. 
CCG4 gave a hit in a contig of 27150 bp of the working draft sequence of RPCI-11 

10 BAC 29013 (GenBank acc No AC022662, OI: 7249117). CCG6 was part of the 
complete sequence of RPCI-1 1 BAC 793J2 (GenBank acc No AC009802). 

F: Identification and in silico characterisation of NCAG1 gene. 

To find genes possibly associated with the potential CpG islands CCG4 and CCG6, 

15 their surrounding BAC sequences were analysed using bioinformatic tools. Hence the 
27150 bp contig of BAC 29013 and the complete sequence of BAC 793J2 were sent 
for analysis to the Rummage High-Throughput Sequence Annotation Server 
(http://genlO0.imb-jena.de/rummage/index.html). 

First, LCP(Huang 1994) and CPG(Larsen et al 1992) recognized CpG islands 
20 containing CCG4 and CCG6 of 1.2 kb and 0.4 kb respectively, confirming their 
potential role as CpG islands. 

In a next step, exon prediction programs Grail(Uberbacher & Mural 1991) and 
Genscan(Burge & Karlin 1997) both predicted the presence of a 3639 bp exon, 1.5 kb 
downstream of the 1.2 kb large CpG island containing CCG4. This predicted exon 

25 contains an open reading frame (ORF) which starts at an ATG start codon with an 
almost perfect Kozak sequence and ends with a TAA stop codon. Other predicted 
features are a transcription start site (TSS) at 2352 bp upstream of the ORF (score 76.6 
by Proscan(Prestridge 1995)) and polyadenylation signals at 3032, 3247, 4364, 5338 
and 8266 downstream of the ORF (respective scores of 4.79, 3.83, 4.94, 4.93 and 6.27 

30 by PolyAH(Salamov & Solovyev 1997)) (Figure2a). 

BLASTN(Altschul et al 1990) alignment searches to sequences of dbEST revealed 
significant homology (> 97 %) to 21 human ESTs (Tablel, Figure2b). 
TBLASTX(Altschul et al 1997) searches of the Genbank non-redundant database (nr) 
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with the ORF showed extensive homology on protein level with SART-2 (Genbank 
Acc No NPJ)37484), a squamous cell carcinoma antigen recognized by T-cells(Nakao 
et al 2000). Weaker homology was found with a series of sulfotransferases. Analysis of 
the 1212 long aminoacid sequence of the translated ORF by SMART (Simple Modular 
5 Architecture Research Tool, V3.1)(Schultz et al 2000) did not result in any known 
domains apart from a cleavable signal peptide at position 1-20 and two transmembrane 
segments at positions 771-791 and 800-820. Interpro reporterd no significant hits, 
although BLASTP(AltschuI et al 1997) of the Prodom database showed homology 
between the NCAG1 gene and the chondroitin-6-sulfotransferase domain (Prodom Acc 
10 NoPD042460) 

G; Characterisation of the structural organisation of the NCAG1 gene. 

Based on the BLASTN EST hits I.M.A.G.E. Consortium [LLNL] cDNA 

Clones(Lennon et al 1996) were ordered and sequenced. The sequences alligned with 

15 the genomic sequence in the presumed 5* UTR (untranslated region), the ORF and the 

presumed 3' UTR, indicating that these sequences are indeed transcribed (Figure2c). 

Alignment of the sequence of IMAGp998B194346Q2 with the genomic sequence 

showed that a 865 bp fragment was missing in the cDNA. A detailed analysis of the 

flanking sequences revealed the presence of consensus acceptor and donor splice sites, 

20 confirming that this fragment is probably an intron. Also clone IMAGp998D193628Q2 

missed a fragment of 1.9 kb when compared to the genomic sequence, but consensus 

splice sites were absent. Two clones, IMAGp998D193628Q2 and 

IMAGp998A136826Q2, terminated exactly at the predicted polyadenylation signal, 4.4 

kb downstream of the ORF. Sequences of clones MAGp998A154307Q2, 

25 IMAGp998D126826Q2 and IMAGp998F131866Q2 did not align with the genomic 

sequence and were not analysed further. 

Since cDNA clone sequencing did not result in a continuous sequence of the transcript, 
primers were designed and used for RT-PCR experiments. Sequencing of different 
overlapping RT-PCR products confirmed the presence of a transcript of at least 9 kb, 
30 containing the ORF of the predicted exon, linked to the presumed 5' and 3' sequences 
(Figure2d). The 5 prime intron of 865 bp was confirmed and the 3' UTR was extended 
till the predicted polyadenylation signal, 4.4 kb downstream of the ORF. 
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H: Characterisation of the expression pattern of the NCAG1 gene. 

To investigate the expression profile of the NCAG1 gene, a long-range PCR spanning 

the ORF was optimised on genomic DNA and applied on a cDNA mapping panel. This 
showed that the fragment was present in cDNA from brain, fetal brain, placenta and 

5 liver but could not be detected in cDNA from testis and lung. More detailed 

information on the expression in the brain was obtained by Northern blot hybridisation 
showing expression of a > 9.5 kb transcript in all investigated tissues (lung, placenta, 
small intestine, liver, kidney, skeletal muscle, heart, brain, uterus, trachea, thyroid, 
stomach, spinal cord, prostate, mammary gland, lymph node, brain (whole), bladder, 

10 adrenal gland, amygdala, caudate nucleus, corpus callosum, hippocampus, substantia 
nigra, thalamus and total brain). 

Stringent Zooblot hybridisation experiments showed the presence of homologous 
sequences in the genomic DNA of other mammals like dog, pig, mouse, donkey, horse 
and sheep. 

15 

I: Mutation analysis of the NCAG1 gene. 

Since this novel CpG-associated gene is brain-expressed and located in the 
chromosome 18q21.3-q23 BP candidate region, a mutation analysis of the ORF was 
performed on 3 patients and 1 escapee of the chromosome 18 linked family MAD31. In 

20 this way two single nucleotide polymorphisms were identified. The first is a C to T 
transition on position 2017 of the ORF, changing aminoacid (AA) 673 from proline to 
serine. This polymorphism was only found in the healthy control. The second 
polymorphism was found in all three patients. It was also a C to T transition, located at 
position 2824 and changing the 942 AA from proline to serine. Analysis of this 

25 polymorphism in family MAD31 showed that the T-allele was present on the disease 
haplotype. 

Both polymorphisms were analysed in an association study on 92 BP patients and 92 
age, sex and ethnicity matched controls by PCR-RFLP analysis. The P673S 
polymorphism turned out to be a frequent polymorphism with both alleles roughly 
30 equally present. The P942S polymorphism however was found to be a rare 
polymorphism, with the T allele only present in 3 BP patients and in 2 controls. 
Statistical analysis showed the control population was in Hardy- Weinberg equilibrium 
for both polymorphisms. No alleles, genotypes or haplotypes were found to be 
associated to BP disorder. 
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Since triplet repeat fragmentation was proven to be a valid method for the region 
specific isolation of triplet repeats(Goossens et al 2000), we applied it to the 
chromosome 18q21.33-q23 BP candidate region for the isolation of CCG/CGG repeats. 
5 Therefore, we first had to construct a new set of fragmentation vectors, pDVCCG and 
pDVCGG. Fragmentation experiments with these vectors resulted in transformation 
and fragmentation efficiencies in the same range as obtained with the CAG/CTG 
fragmentation vectors pDVCAG and pDVCTG (data not shown). Application of 
CCG/CGG fragmentation to YAC 961h9 resulted in the isolation of the (CGG) 6 repeat 

10 in the 5* UTR of CAP2. This repeat is adjacent to the (CAG) 6 repeat previously 
reported(Goossens et al 2000). There, it was shown that this (CGG) 6 (CAG)6 repeat is 
polymorphic but not expanded in BP cases nor associated with BP disorder. Taken 
together, the CCG/CGG YAC fragmentation data does not support CCG/CGG repeats 
as disease causing agents in chromosome 18q21.33-q23 linked BP disorder. 

15 On the other hand, fragmentation experiments resulted in three sequences (CCG3, 
CCG4 and CCG6) with high CG (70 - 80 %) and CpG content but containing no 
CCG/CGG repeat. CpG islands are usually defined as regions of DNA of more than 
200 bases that have a CG content above 50 % and a ratio of observed versus expected 
CpGs close to that statistically expected. Therefore, CCG3, CCG4 and CCG6 can be 

20 considered as potential CpG islands. Analysis of surrounding sequences of CCG4 and 
CCG6 with LCP(Huang 1994) and CPG(Larsen et al 1992) confirmed that the 
fragmentation occurred in both cases indeed in a CpG island. Since CpG islands are 
strongly associated with genes, more specifically housekeeping and widely expressed 
genes, these three sequences are likely to be located near this class of genes. 

25 In the search for genes possibly associated with the isolated CpG islands, exon 
prediction programs Grail (Uberbacher & Mural 1991) and Genscan(Burge & Karlin 
1997) both predicted the presence of a 3.6 kb exon downstream of the largest CpG 
island isolated. Two facts argued strongly against a false positive prediction. The first 
was that this two programs, based on different models, predicted exactly the same 

30 exon. The second was the mere presence in genomic DNA of this ORF continuing for 
3.6 kb and starting with a Kozak consensus ATG. Additional evidence that this exon 
was indeed transcribed was found in the fact that a series of ESTs had very high 
homologies (97-100 %) with sequences in and surrounding the ORF. In a next step, this 
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evidence was extended by sequencing of the cDNA clones from which the ESTs 
originated. The EST sequences were prolonged and corrected and the homologies 
increased to 99-100 %. The fact that the cDNA clones originated from different cDNA 
libraries (Tablel) indicated that the gene was expressed in different tissues. RT-PCR 

5 and northern blot experiments resulted in the final confirmation that this ORF was 
widely expressed, a usual characteristic of a CpG-associated gene. 
cDNA clone sequencing resulted in complete sequence of seven human cDNA clones 
aligning with NCAG1. In two cases a piece of genomic DNA was missing in the cDNA 
sequence. Clone MAGp998B194346Q2 lacked a 865 bp fragment (Figure2c). Since 

10 this fragment was flanked by splice donor and acceptor consensus sequences, and since 
the fragment was also missing in the RT-PCR products, enough evidence was gathered 
to call it an intron. Clone IMAGp998D193628Q2 also missed a 1.4 kb fragment 
compared to the genomic sequence. In this case no consensus splice sites were present. 
Moreover cDNA clones MAGp998L153967Q2 and IMAGp998A136826Q2 contain 

15 sequences that are located in the missing fragment of IMAGp998D193628Q2 
(Figure2c). This data together with the fact that EST AA442543 is located entirely in 
the missing fragment (Figure2b) and the presence of this fragment in the RT-PCR 
products (Figure2d) indicate that this fragment might rather be an artifact than an 
intron. 

20 EST-homologies and cDNA clone sequencing proved that a series of cDNA clones 
terminated at a predicted polyadenylation signal, 4.3 kb downstream of the ORF or 
10.3 kb downstream of the predicted TSS. If the 5 prime intron of 865 bp is taken into 
account, the size of transcript will be 9.5 kb, which is the size of the transcript 
recognized in the Northern blot experiment. 

25 On protein level, a cleavable signal peptide and two transmembrane domains are 
predicted. If this is correct, both N-terminal and C-terminal sides will be at the same 
side of the membrane in which it is embedded The strong homology with the S ART-2 
protein is significant, but it does not add more clues as to potential functions of the 
novel protein. 

30 The 2824T allele, present on the disease haplotype in the chromosome 18 linked family 
MAD31, is a very rare allele with a frequency of 0.03. Therefore statistical analysis in 
an association sample loses a lot of its strength, leaving the possibility that this allele 
confers an increased risk for BP disorder. 
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CLAIMS 

What is claimed is: 

5 

1. An isolated nucleic acid comprising the nucleotide sequence of SEQ ID NO: 1. 

2. An isolated nucleic acid consisting essentially of the nucleotide sequence of SEQ 
ID NO: 1. 

10 

3. An isolated nucleic acid for comprising a nucleotide sequence that encodes the 
amino acid sequence of SEQ ID NO: 2. 

4. An isolated nucleic acid comprising the nucleotide sequence of SEQ ID NO: 3. 

15 

5. An isolated nucleic acid consisting essentially of the nucleotide sequence of SEQ 
ID NO: 3. 

6. An isolated nucleic acid consisting of the nucleotide sequence of SEQ ID NO: 1 or a 
20 contiguous fragment thereof wherein said isolated nucleic acid encodes a polypeptide 

having biological activity of bipolar disorder protein. 

7. An isolated nucleic acid that hybridizes under high stringency conditions to a 
nucleic acid having a sequence complementary to the nucleotide sequence of SEQ ID 

25 NO: 1, wherein said isolated nucleic acid encodes a polypeptide having biological 
activity. 

8. An isolated nucleic acid that encodes a polypeptide having the biological activity, 
said isolated nucleic acid consisting of a nucleotide sequence that is at least 90% 

30 identical to the nucleotide sequence of SEQ ID NO: 1. 

9. An isolated nucleic acid consisting of the nucleotide sequence of SEQ ID NO: 3 or a 
contiguous fragment thereof wherein said isolated nucleic acid encodes a polypeptide 
having biological activity. 

35 
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10. An isolated nucleic acid that hybridizes under high stringency conditions to a 
nucleic acid having a sequence complementary to the nucleotide sequence of SEQ ID 
NO: 3, wherein said isolated nucleic acid encodes a polypeptide having the biological 
activity. 

11. An isolated nucleic acid that encodes a polypeptide having the biological activity;, 
said isolated nucleic acid consisting of a nucleotide sequence that is at least 90% 
identical to the nucleotide sequence of SEQ ID NO: 3. 

12. Isolated and substantially purified protein encoded by the nucleic acid of Claim 6. 

13. Isolated and substantially purified viral inhibitory protein 1 and 2 encoded by the 
nucleic acid of claim 9. 

14. Isolated and substantially purified viral inhibitory protein having the amino acid 
sequence of SEQ ID NO: 2. 

15. Isolated and substantially purified protein having an amino acid sequence that is at 
least 90% identical to the sequence of SEQ ID N0:2. 

16. Isolated and substantially purified protein having an amino acid sequence that is at 
least 90% identical to the sequence of SEQ ID N0:4. 

17. Isolated and substantially purified protein having an amino acid sequence that is at 
least 90% identical to the sequence of SEQ ID NO: 4. 

18. A vector comprising the nucleic acid of claim 1. 

19. A vector comprising the nucleic acid of claim 4. 

20. A vector comprising the nucleic acid of claim 6 operable linked to an expression 
control sequence. 

21. A host cell comprising the nucleic acid of claim 6. 



22. A host cell comprising the vector of Claim 20. 
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23. A method of making protein 1 and 2 comprising: 

a) introducing the nucleic acid of claim 6 into a host cell; 

b) maintaining said host cell under conditions whereby said nucleic acid is expressed to 
protein; 

5 c) recovering said protein. 

24. A method of making protein comprising: 

a) introducing the nucleic acid of claim 9 into 

b) maintaining said host cell under conditions 
10 produce protein; 

c) recovering said protein. 

25. A method of making protein comprising: 
a) introducing the nucleic acid of Claim 16 into a host cell; 

15 b) maintaining said host cell under conditions whereby said nucleic acid is expressed to 
produce viral inhibitory protein; 
c) recovering said protein. 

26. A composition comprising purified protein and a carrier. 

20 

27. The composition according to claim 26 which further comprises viral inhibitory 
protein 2. 



a host cell; 

whereby said nucleic acid is expressed to 
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SEQUENCE LISTING 

<110> Janssen Pharamceutica NV 

5 

<120> Novel Brain Expressed Gene and Protein associated with 
Bipolar Disorder 

<130> NCAG1 

10 

<140> 
<141> 



<160> 4 

15 

<170> Patentln Ver. 2.1 

<210> 1 
<211> 9528 
20 <212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS encoding Human NCAG1 protein 
25 <222> (1507) . . (5142) 

<400> 1 

acctgctttc ggccccgccc 
30 cgaagccccg cctctgaccc 



agcgtcgctg tctctcgcct 



ccgtcctgcc tccgccccgc 

35 

gggctgcgcg gccggagctg 



gacgcgggcg ctgctctgga 
40 tcctgcggtc cgtggttgcc 



ggccgtgcgg gtcgcgggga 



ccgtagcgcg ggagacgacg 

45 

gcagagcagt tttctggaac 



gcccccgtcg ggatctgcct 
50 tgttgctgaa aggggaattt 



gtatctaggt gactgaagat 



agatgtgaac atgatttctc 

55 

atttgaaatg aaggcaagaa 



ttgaggatgt ttctctctca 
60 aaagttgaga ggaatgacaa 



ttctttgggc aaacatacac 



cgcccgccgc 


cggcctgctc 


acggctcctc 


ccgtcctccc 


60 


cgccctgtcc 


tgtctccgtc 


ccgccccacg 


cccgccagcc 


120 


tccctgaggc 


cccgccttca 


gccccgcctt 


caaccccgcc 


180 


ccccgcttgc 


cggcccgcgt 


cgccgtctct 


caccctcccc 


240 


gcacagagga 


tcctcggccg 


cggcgacatc 


accgcctggg 


300 


tacggcgcca 


ccgagagaac 


ccgccgcccg 


cgggtctctg 


360 


cccacaagcg 


tccggcgttt 


cctgagggcg 


ggcgtgtccg 


420 


ccgagcgcgg 


ctgaggagac 


cgagcctggg 


gcagcgcctg 


480 


cgggggtctt 


gcggagcccc 


gcgggagcct 


ggcccgccgt 


540 


tctccacctc 


cgtctccctt 


ggggcccagt 


gcggcgccga 


600 


gagaaagtgt 


catgaaaaaa 


gagcagaaga 


gagacctcac 


660 


tctttcgccc 


gttggcggtt 


acttcatgat 


cggacgagaa 


720 


attccatttt 


tatgtttgta 


cacatgaagc 


tgataaaaga 


780 


tttgtcataa 


taggctgatg 


agtaagtaag 


cctgaaaaat 


840 


ttttgaattt 


ttaaaaacca 


actaagactt 


tgatcacttg 


900 


taaatgaaag 


aaaaacgtat 


tcacaagaca 


agaagtataa 


960 


ctgagtccac 


tcactcgaag 


aatgtcagta 


cttcatcatc 


1020 


aaatgcatca 


tacatgtgtg 


gtgagcttat 


caccagtgat 


1080 
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ggttttctgt gctagaaatg actcttaatt 


tgaattttgg 


agtgcttttt ctcttttttt 


1140 




acaatgtgtg ttccaactct ttgtgttaaa 


, tagatttaag 


taaaggaggt aaatgctaaa 


1200 


5 


ttcatagtgt tttttacctg tatcacttcc 


ctgtgtatta 


tggaaaaatt agagatttta 


1260 




acgttattca aagttttact ggaagcaaaa 


> ctgtgccagg 


gacagagata tacaatttaa 


1320 


10 


gtttctcttt ttggcaactg cacttgctta 


. naatgtactg 


aatgtcagct ggatttcaca 


1380 




gcatatcaga tttacagtct ttgtcttatc 


: aaggecttta 


ctgtatgttt tatactaacc 


1440 




agatgggaaa cacattgagc atcatatctg 


r acatgtatgc 


ctaagggagg agctccccca 


1500 


15 


tggatc atg gcg tta atg ttt aca gga cat tta eta ttc tta gca tta 


1548 






Met Ala Leu Met Phe Thr Gly His Leu Leu Phe Leu Ala Leu 
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10 
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ttt 


get 


ttc 


tct 


act 
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gag 


gaa 


tct 


gtg 


age 


aat 


tat 
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1596 


20 


Leu 
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Phe 


Ser 


Thr 


Phe 


Glu 


Glu 


Ser 


Val 


Ser 


Asn 


Tyr 
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15 
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25 










3 0 
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gtt 
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gat 


cag 


ttt 
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aaa 


1644 
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uin 
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get 


gga 


gaa 
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caa 


aag 


1740 
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Ser 


Leu 


Tyr 


Phe 


Asp 


Ala 


Gly 


Glu 


He 


Gin 


Ala 


Met 


Arg 


Gin 


Lys 










65 










^ a 
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35 


tct 
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ttg 


cat 


Ctt 


ttt 
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get 


ate 
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agt 


gca 


gtg 


1788 




Ser 
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Ala 
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Leu 
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Leu 


Phe 


Arg 


Ala 


He 


Arg 


Ser 


Ala 


Val 








OA 
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O J 










90 
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gtt 
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ctg 


tec 


aac 
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tac 


tac 


eta 


cct 


cca 
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aag 


cat 


1836 


40 


Thr 


Val 
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Thr 


Tyr 


Tyr 


Leu 


Pro 


Pro 


Pro 


Lys 


His 






95 










100 










105 
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get 
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ttt 


get 
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tgg 
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tat 
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Glu 


He 


Tyr 


Gly 


Asn 


Asn 


Leu 


Pro 




45 
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ttg 
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tgt 


ttg 


tta 
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gaa 


gac 


aaa 


gtt 


gee 


ttt 


1932 




Pro 


Leu 


Ala 


Leu 


Tyr 


Cys 


Leu 


Leu 


Cys 


Pro 


Glu 


Asp 


Lys 


Val 


Ala 


Phe 












130 
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140 
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gaa 


ttt 


gtc 


ttg 


gaa 


tat 


atg 


gac 


agg 


atg 


gtt 


ggc 


tac 


aaa 


gac 


tgg 


1980 




Glu 


Phe 


Val 


Leu 


Glu 


Tyr 


Met 


Asp 


Arg 


Met 


Val 


Gly 


Tyr 


Lys 


Asp 


Trp 










145 










150 










155 










55 


eta 


gta 


gag 


aat 


gca 


cca 


gga 


gat 


gag 


gtt 


cca 


att 


ggc 


cat 


tec 


tta 


2028 




Leu 


Val 


Glu 


Asn 


Ala 


Pro 


Gly 


Asp 


Glu 


Val 


Pro 


He 


Gly 


His 


Ser 


Leu 








160 










165 










170 














aca 


ggt 


ttt 


gee 


act 


gee 


ttt 


gac 


ttt 


tta 


tat 


aac 


tta 


tta 


gat 


aat 


2076 


60 


Thr 


Gly 


Phe 


Ala 


Thr 


Ala 


Phe 


Asp 


Phe 


Leu 


Tyr 


Asn 


Leu 


Leu 


Asp 


Asn 






175 










180 










185 










190 






cat 


cga 


aga 


caa 


aaa 


tac 


ctg 


gaa 


aaa 


ata 


tgg 


gtt 


att 


act 


gag 


gaa 


2124 
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His Arg Arg Gin Lys Tyr Leu Glu Lys lie Trp Val He Thr Glu Glu 

195 200 205 

atg tac gag tat tec aag gtc cgc tea tgg ggc aaa cag ctt etc cat 2172 

5 Met Tyr Glu Tyr Ser Lys Val Arg Ser Trp Gly Lys Gin Leu Leu His 

210 215 220 

aac cac caa gec act aat atg ata gca tta etc aca ggg gee ttg gtg 2220 

Asn His Gin Ala Thr Asn Met He Ala Leu Leu Thr Gly Ala Leu Val 
10 225 230 235 

act gga gta gat aaa gga tct aaa gca aat ata tgg aaa cag get gta 2268 

Thr Gly Val Asp Lys Gly Ser Lys Ala Asn He Trp Lys Gin Ala Val 
240 245 250 



15 



35 



55 



gtg gat gtc atg gaa aag aca atg ttt eta ttg aat cat att gtt gat 2316 
Val Asp Val Met Glu Lys Thr Met Phe Leu Leu Asn His He Val Asp 
255 260 265 270 



20 ggt tct ttg gat gaa ggt gtg gec tat gga age tac aca get aaa tec 2364 
Gly Ser Leu Asp Glu Gly Val Ala Tyr Gly Ser Tyr Thr Ala Lys Ser 
275 280 285 

gtc aca cag tat gtt ttt ctg gec cag cgc cat ttt aat ate aac aac 2412 
25 Val Thr Gin Tyr Val Phe Leu Ala Gin Arg His Phe Asn He Asn Asn 
290 295 300 

ttg gat aat aac tgg tta aag atg cac ttt tgg ttc tat tat gec acc 2460 
Leu Asp Asn Asn Trp Leu Lys Met His Phe Trp Phe Tyr Tyr Ala Thr 
30 305 310 315 

ctt tta cct ggc ttc caa aga act gtg ggt ata gca gat tec aat tat 2508 
Leu Leu Pro Gly Phe Gin Arg Thr Val Gly He Ala Asp Ser Asn Tyr 
320 325 330 



aat tgg ttt tat ggt cca gaa age cag eta gtt ttc ttg gat aag ttc 2556 
Asn Trp Phe Tyr Gly Pro Glu Ser Gin Leu Val Phe Leu Asp Lys Phe 
335 340 345 350 



40 ate tta aag aat gga get gga aat tgg tta get cag caa att aga aag 2604 
He Leu Lys Asn Gly Ala Gly Asn Trp Leu Ala Gin Gin He Arg Lys 
355 360 365 

cac cga cct aaa gat gga ccg atg gtt cct tea act gee caa agg tgg 2652 
45 His Arg Pro Lys Asp Gly Pro Met Val Pro Ser Thr Ala Gin Arg Trp 
370 375 380 

agt act ctt cac act gaa tac ate tgg tat gat ccc cag etc aca cca 2700 
Ser Thr Leu His Thr Glu Tyr He Trp Tyr Asp Pro Gin Leu Thr Pro 
50 385 390 395 

cag cca cct get gat tat ggt act gca aaa ata cac aca ttc cct aac 2748 
Gin Pro Pro Ala Asp Tyr Gly Thr Ala Lys He His Thr Phe Pro Asn 
400 405 410 



tgg ggt gtg gtt act tat ggg get ggg ttg cca aac aca cag acc aac 2796 
Trp Gly Val Val Thr Tyr Gly Ala Gly Leu Pro Asn Thr Gin Thr Asn 
415 420 425 430 



60 acc ttt gtg tct ttt aaa tct ggg aag ctg ggg gga cga get gtg tat 2844 
Thr Phe Val Ser Phe Lys Ser Gly Lys Leu Gly Gly Arg Ala Val Tyr 
435 440 445 
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gac ata gtt cat ttt cag cca tat tec tgg att gat ggg tgg aga agt 
Asp lie Val His Phe Gin Pro Tyr Ser Trp He Asp Gly Trp Arg Ser 
450 455 460 



2892 



5 ttt aac cca gga cat gag cat cca gat cag aac tea ttt act ttt gee 2940 

Phe Asn Pro Gly His Glu His Pro Asp Gin Asn Ser Phe Thr Phe Ala 
465 470 475 

ccc aat gga caa gta ttt gtt tct gaa get etc tat gga ccc aag ttg 2988 

10 Pro Asn Gly Gin Val Phe Val Ser Glu Ala Leu Tyr Gly Pro Lys Leu 

480 485 490 

age cac ctt aac aat gta ttg gtg ttt get cca tea ccc tea age cag 3036 

Ser His Leu Asn Asn Val Leu Val Phe Ala Pro Ser Pro Ser Ser Gin 
15 495 500 505 510 

tgt aat aag ccc tgg gaa ggt caa ctg gga gaa tgt gcg cag tgg ctt 3084 

Cys Asn Lys Pro Trp Glu Gly Gin Leu Gly Glu Cys Ala Gin Trp Leu 

515 520 525 



20 



aag tgg act ggc gag gag gtt ggt gat gca get ggg gaa ata ate act 3132 
Lys Trp Thr Gly Glu Glu Val Gly Asp Ala Ala Gly Glu He He Thr 
530 535 540 



25 gec tct caa cat ggg gaa atg gta ttt gtg agt ggg gaa gee gtg tct 3180 
Ala Ser Gin His Gly Glu Met Val Phe Val Ser Gly Glu Ala Val Ser 
545 550 555 

get tat tct tea gca atg aga ctg aaa agt gta tat cgt get ttg ctt 3228 
30 Ala Tyr Ser Ser Ala Met Arg Leu Lys Ser Val Tyr Arg Ala Leu Leu 
560 565 570 

etc tta aat tec caa act ctg eta gtt gtt gat cat att gag agg caa 3276 
Leu Leu Asn Ser Gin Thr Leu Leu Val Val Asp His He Glu Arg Gin 
35 575 580 585 590 

gaa gat tec cca ata aat tct gtc agt gee ttc ttt cat aat ttg gat 3324 

Glu Asp Ser Pro He Asn Ser Val Ser Ala Phe Phe His Asn Leu Asp 

595 600 605 

40 

att gat ttt aaa tat ate cca tat aag ttt atg aat agg tat aat ggt 3372 

He Asp Phe Lys Tyr He Pro Tyr Lys Phe Met Asn Arg Tyr Asn Gly 

610 615 620 

45 gee atg atg gat gtg tgg gat gca cat tac aaa atg ttt tgg ttt gat 3420 
Ala Met Met Asp Val Trp Asp Ala His Tyr Lys Met Phe Trp Phe Asp 
625 630 635 

cat cat ggc aat agt ccc atg gee agt ata cag gaa gca gag caa get 3468 
50 His His Gly Asn Ser Pro Met Ala Ser He Gin Glu Ala Glu Gin Ala 
640 645 650 

get gaa ttt aaa aaa cga tgg act caa ttt gtt aat gtt act ttt cag 3516 
Ala Glu Phe Lys Lys Arg Trp Thr Gin Phe Val Asn Val Thr Phe Gin 
55 655 660 665 670 

atg gaa ccc aca ate aca aga att gca tat gtc ttt tat ggg cca tat 3564 
Met Glu Pro Thr He Thr Arg He Ala Tyr Val Phe Tyr Gly Pro Tyr 
675 680 685 



60 



ate aat gtc tec age tgc aga ttt att gat agt tec aat cct gga ctt 3612 
He Asn Val Ser Ser Cys Arg Phe He Asp Ser Ser Asn Pro Gly Leu 
690 695 700 
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cag att tct etc aat gtc aat aat act gaa cat gtt gtt tct att gta 3660 

Gin lie Ser Leu Asn Val Asn Asn Thr Glu His Val Val Ser lie Val 

705 710 715 

5 

act gat tac cat aac ctg aag aca aga ttc aat tat ctg gga ttc ggt 3708 

Thr Asp Tyr His Asn Leu Lys Thr Arg Phe Asn Tyr Leu Gly Phe Gly 

720 725 730 

10 ggc ttt gec agt gtg get gat caa ggc caa ata acc cga ttt ggt ttg 3756 
Gly Phe Ala Ser Val Ala Asp Gin Gly Gin lie Thr Arg Phe Gly Leu 
735 740 745 750 

ggc act caa gca ata gta aag cct gta aga cat gat agg att att ttc 3804 
15 Gly Thr Gin Ala lie Val Lys Pro Val Arg His Asp Arg He He Phe 

755 760 765 

ccc ttt gga ttt aaa ttt aat ata gca gtt gga tta att ttg tgc att 3852 
Pro Phe Gly Phe Lys Phe Asn He Ala Val Gly Leu He Leu Cys He 
20 770 775 780 

age ttg gtg att tta act ttc caa tgg cgt ttt tac ctt tct ttt aga 3900 
Ser Leu Val He Leu Thr Phe Gin Trp Arg Phe Tyr Leu Ser Phe Arg 
785 790 795 



25 



45 



aaa eta atg cga tgg ata tta ata ctt gtt att gee ttg tgg ttt att 3948 
Lys Leu Met Arg Trp He Leu He Leu Val He Ala Leu Trp Phe He 
800 805 810 



30 gag ctt ttg gat gtg tgg age act tgt agt cag ccc att tgt gca aaa 3996 

Glu Leu Leu Asp Val Trp Ser Thr Cys Ser Gin Pro He Cys Ala Lys 
815 820 825 830 

tgg aca agg aca gag get gag gga age aag aag tct ttg tct tct gaa 4044 

35 Trp Thr Arg Thr Glu Ala Glu Gly Ser Lys Lys Ser Leu Ser Ser Glu 

835 840 845 

ggg cac cac atg gat ctt cct gat gtt gtc att acc tea ctt cct ggt 4092 

Gly His His Met Asp Leu Pro Asp Val Val He Thr Ser Leu Pro Gly 
40 850 855 860 

tea gga get gaa att etc aaa caa ctt ttt ttc aac agt agt gat ttt 4140 

Ser Gly Ala Glu He Leu Lys Gin Leu Phe Phe Asn Ser Ser Asp Phe 
865 870 875 



etc tac ate agg gtt cct aca gec tac att gat att cct gaa act gag 4188 
Leu Tyr He Arg Val Pro Thr Ala Tyr He Asp He Pro Glu Thr Glu 
880 885 890 



50 ttg gaa ate gac tea ttt gta gat get tgt gaa tgg aag gtg tea gat 4236 
Leu Glu He Asp Ser Phe Val Asp Ala Cys Glu Trp Lys Val Ser Asp 
895 900 905 910 

ate cgc agt ggg cat ttt cgt tta etc cga ggc tgg ttg cag tct tta 4284 
55 He Arg Ser Gly His Phe Arg Leu Leu Arg Gly Trp Leu Gin Ser Leu 

915 920 925 

gtc cag gac aca aaa tta cat ttg caa aac ate cat ctg cat gaa ccc 4332 
Val Gin Asp Thr Lys Leu His Leu Gin Asn He His Leu His Glu Pro 
60 930 935 940 

aat agg ggt aaa ctg gee caa tat ttt gca atg aat aag gac aaa aaa 4380 
Asn Arg Gly Lys Leu Ala Gin Tyr Phe Ala Met Asn Lys Asp Lys Lys 
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945 



950 



955 



aga aaa ttt aaa agg aga gag tct ttg cca gaa caa aga agt caa atg 
Arg Lys Phe Lys Arg Arg Glu Ser Leu Pro Glu Gin Arg Ser Gin Met 
960 965 970 



4428 



10 



aaa ggc gcc ttt gat aga gat get gaa tat att agg get ttg agg aga 4476 
Lys Gly Ala Phe Asp Arg Asp Ala Glu Tyr lie Arg Ala Leu Arg Arg 
975 980 985 990 

cac ctg gtt tac tat cca agt gca cgt cct gtg etc agt tta age agt 4524 
His Leu Val Tyr Tyr Pro Ser Ala Arg Pro Val Leu Ser Leu Ser Ser 
995 1000 1005 



15 gga age tgg acg tta aag ctt cat ttt ttt cag gaa gtt tta gga get 
Gly Ser Trp Thr Leu Lys Leu His Phe Phe Gin Glu Val Leu Gly Ala 
1010 1015 . 1020 



4572 



teg atg agg gca ttg tac ata gta aga gac cct egg gca tgg att tat 
20 Ser Met Arg Ala Leu Tyr lie Val Arg Asp Pro Arg Ala Trp He Tyr 
1025 1030 1035 



4620 



25 



tea atg ttg tac aat agt aaa cca agt ctt tat tct ttg aag aat gta 
Ser Met Leu Tyr Asn Ser Lys Pro Ser Leu Tyr Ser Leu Lys Asn Val 
1040 1045 1050 



4668 



30 



cca gag cat tta gca aaa ttg ttt aaa ata gag gga ggt aaa ggc aaa 4716 
Pro Glu His Leu Ala Lys Leu Phe Lys He Glu Gly Gly Lys Gly Lys 
1055 1060 1065 1070 

tgt aac tta aat teg ggt tat get ttc gag tat gaa cca ttg agg aaa 4764 
Cys Asn Leu Asn Ser Gly Tyr Ala Phe Glu Tyr Glu Pro Leu Arg Lys 
1075 1080 1085 



35 gaa tta tea aaa tec aaa tea aat gca gtg tec etc ttg tct cac ttg 
Glu Leu Ser Lys Ser Lys Ser Asn Ala Val Ser Leu Leu Ser His Leu 
1090 1095 1100 



4812 



tgg eta gca aat aca gca gca gcc ttg aga ata aat aca gat ttg ctg 
40 Trp Leu Ala Asn Thr Ala Ala Ala Leu Arg He Asn Thr Asp Leu Leu 
1105 1110 1115 



4860 



45 



cct act age tac cag ctg gtc aag ttt gaa gat att gtg cat ttt cct 
Pro Thr Ser Tyr Gin Leu Val Lys Phe Glu Asp He Val His Phe Pro 
1120 1125 1130 



4908 



50 



cag aaa act act gaa agg att ttt gcc ttt ctt gga att cct ttg tct 4956 
Gin Lys Thr Thr Glu Arg He Phe Ala Phe Leu Gly He Pro Leu Ser 
1135 1140 1145 1150 

cct get agt tta aac caa ata ttg ttt gcc ace tct aca aac ctt ttt 5004 
Pro Ala Ser Leu Asn Gin He Leu Phe Ala Thr Ser Thr Asn Leu Phe 
1155 1160 1165 



55 tac ctt ccc tat gaa ggg gaa ata tea cca act aat act aat gtt tgg 
Tyr Leu Pro Tyr Glu Gly Glu He Ser Pro Thr Asn Thr Asn Val Trp 
1170 1175 1180 



5052 



aaa cag aac ttg cct aga gat gaa att aaa eta att gaa aac ate tgc 
60 Lys Gin Asn Leu Pro Arg Asp Glu lie Lys Leu He Glu Asn He Cys 
1185 1190 1195 



5100 



tgg act ctg atg gat cgc eta gga tat cca aag ttt atg gac 



5142 
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Trp Thr Leu Met Asp Arg Leu Gly 
1200 1205 



taaatgctgc aggtcagcag aaatttgcac 

5 

atgaatcaga agagtttgtt tattctttag 



ttcagtgttg tttgcacaga gagattgttt 
10 gatttatttt tatgtcatca cctcccttgc 



agtttctgct acagagtggt agatgaagtt 



tttaagtttt tgtctaactc cccttcatct 

15 

aacactgcta aaggccttgc aattgctgct 



actacaaaag ttccttatcc ttttgaaaag 
20 aaaagaaaat ttcttttact gtgtttaatg 
gctcctatca gaactatagg atttcttctg 
tgtttttttg aggtcggaaa ctgactttaa 

25 

ataaggaata agtctttgaa caatctgggt 



attgtctgtt taaaactctc ctttcacttt 
30 atcacatcac tcccatccta tcctttctgt 



aatcgctgaa ctctcaatat tgtggggcat 
agactgacac agacttagaa tcaaatttat 

35 

gtagtgccac tgcagtgtct ttttaaactg 
cttgcatccc tttgaatgag tttacagact 
40 ggagatgatg tcagaggcat ctgtttcctt 
gtcataaagt gtggtttatt ttattttggt 
taagccagtg gagtaattac aatgtattgg 

45 

tgaaaatctc tgtacagatt gcagtcttct 



gctctctaac acttggaagt ctgtcattct 
50 gttattatta tgtcaaaatg tgcctccaga 



gctaaaccta acttggctgt catttttctt 



ctccccaaca tattccttcc catatctctc 

55 

tgaaattcat gtgaatgtag gttgagaggg 



tcaagaaaga ttattcattc tatctcagag 
60 ataatatcta catgaatatt gcatgctaca 



agaatctcag tgtttacttt caattcctag 



Pro Lys Phe Met Asp 
1210 



taataatact 


taccaaccca 


ctttgtggat 


5202 


tgtgtgtgtg 


tgtgtgcacg 


cgtgtatgtg 


5262 


taaaaaatgg 


caeca tattt 


ggcctagcag 


5322 


ctttgtttct 


gaaaattttg 


tetgetaaaa 


5382 


atatcatggg 


gtcaggggag 


atgggaaaat 


5442 


gtaactgtgc 


taatctatct 


agagacctca 


5502 


ttacccacgc 


atetcttget 


ttcaagatgg 


5562 


gtcttctgac 


acacttatct 


tgcacaaaga 


5622 


ttcagtgata 


tcactgagga 


aatggtgaaa 


5682 


ggaaatacag 


atggaaatac 


agaatgaata 


5742 


aagcctcctt 


gaagtttttt 


acttagaaat 


5802 


ggcaagggct 


ggtagattat 


tttagacatg 


5862 


ttatcctccc 


tggagctaca 


gctgttcgcc 


5922 


cactgtcaag 


caaaacaatc 


agtagttact 


5982 


tttcccccca 


gttgattaat 


tttgcgttaa 


6042 


ttttctggaa 


ttaacactct 


gtgactcaaa 


6102 


gaaacagaat 


tggaaaactg 


cctgacttat 


6162 


gccagtgtct 


gcaaaagttg 


aaagcaaatg 


6222 


taccatctgc 


atcttattat 


aaatgtagtc 


6282 


aggctctgaa 


ateaaaatge 


tacgecatta 


6342 


atgaaaacat 


aaggcagtgt 


ggagacttga 


6402 


tcctgatgtt 


tcaaactgtg 


gttcccccaa 


6462 


gacctagata 


aaagtggttc 


tttctcagta 


6522 


gtgataaagc 


tctgtatatg 


ttagattcca 


6582 


ccattatagt 


gtgagtggag 


actgcccccc 


6642 


atgattgtcc 


ctctgtaatt 


tcaaaatgaa 


6702 


cactgaagac 


ctgaatctac 


actagtaatc 


6762 


ttaccggcaa 


gcatataaaa 


tgctacttgg 


6822 


tggttgataa 


cactatttcc 


attattgggc 


6882 


gatatgtgat 


cgtgaatcag 


atcacatata 


6942 
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aaaagtctgg attgtcagta gtattagatc 
ggtagcaagc aagaaagcag aaactactgt 
5 cagaaatgta cctgttggcg gccgggtgca 
ggaggccgag gcgggtggat cacgaggtca 
gaaaccccgt ctctactaaa aaaaaagtac 

10 

ctgtagtccc agctacacgg gaggctgagg 
gcttgcagtg agtggagatg cgccactgca 
IS cctcaaaaaa aaaaaaaaaa aagaaaaaaa 
atggagtatg tggagtaata gggaaagaag 
acactgagaa tgaatatggg aacacgtcat 

20 

aaatgatctt tacaatgtat cccagctttc 
acttgtttct gtactcacct actcccaccc 
25 atttagttat tctaaaatag aaagtttgct 
taaggaatgt tatgtatggg tgagcgggtg 
ggtgcctgag acccctgcct tagaaacaga 

30 

ttcccaggtc ctgcaccctt agggtgatct 
ttttgttctt tgaaataatt aaaagaggga 
35 acttacaaaa tttcttatct gtaaaatccg 
ggcctaattg agataattgc ttttattata 
aggacttatt aattttgctg acaaaagtga 

40 

aaattttcaa acaacataga tttactcaag 
atttaatgaa aactcagagg aaataggaaa 
45 atagatttag tttgtagaat ttaatttaaa 
aatatgtacg tttaggtgtg gacaccaaaa 
agaataacta ataaatgcct gacaagaatg 

50 

ggtcttgcta cattgccctc tgcaaatgta 
tttggttggc ataattgttc agcaacgatt 
55 ttggtataca aagtatatca caattttaag 
gctttgtttg gcttgattaa ctttgatcag 
gaaaaagcat aaatagaatg cagtataaca 

60 

cttttttttt tttttttttt ttttggggat 
gcagtggtct gatctcggct cactgcaacc 



tgatcaaggt 


aggaattaca 


attgcatgca 


7002 


tccctttatt 


ttaacattgt 


acagacaata 


7062 


gtggctcacg 


cctgtaatcc 


cagcacttcg 


7122 


ggagatcaag 


accatcctgg 


ctaacacggt 


7182 


aaaaaattag 


ccgggcgtgg 


tggcgggcac 


7242 


caggagaatg 


gcatgaacct 


gggaggcaga 


7302 


ctccagcctg 


ggcgacagag 


cgagactccg 


7362 


gaaatgtacc 


tgttggcagg 


agaaggccag 


7422 


agttacagaa 


aatgaaaaag 


aaaatgagtt 


7482 


tgatagcaaa 


agaaaggtac 


aggcttacga 


7542 


acccccacat 


ggcaatgcag 


agttgtattt 


7602 


caagggaaga 


ttttagacat 


gaaccctact 


7662 


ggagaaagcg 


tctactcaca 


gattgttctg 


7722 


acacatccat 


tgggtatgta 


tgcatgtgat 


7782 


attcctaagg 


ggattgactc 


tcccagcatg 


7842 


aggaaaattt 


taaatagctt 


ctactcttat 


7902 


ttatcactat 


ctgatacttc 


tgaaagaaac 


7962 


tctttttcta 


cattaacttc 


cccaaacata 


8022 


ataataggat 


tgaaatttta 


aaattttgaa 


8082 


agtaacaaat 


ataatgataa 


ttggcttttt 


8142 


atgaaataaa 


aaggccatat 


tcagagttga 


8202 


atctgctcag 


gagaaagaag 


ctaaatctgc 


8262 


atttaaattt 


taacaaagtg 


atgacacaac 


8322 


tattagacat 


ttgattgtcc 


ttttacatag 


8382 


ggacaatcct 


tccttgtatc 


aaaattccca 


8442 


ttcaaagaag 


aacctcctcc 


accacttact 


8502 


tctgtacatc 


accaagtatc 


tttggcattc 


8562 


tgagtaaata 


ttaatgataa 


tttttgaatt 


8622 


aaatagaaac 


gttttcattt 


gttgatttag 


8682 


ccacttccaa 


aggtaaggat 


acctaacatt 


8742 


ggagtctcac 


tttgttgccc 


aggctggagt 


8802 


tccgcctacc 


gggttcaagt 


gattctccta 


8862 
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15 



cgtcagcctc ctgaatagct gggattacag gtgcacgcca ccatgcttgg ctcatttttg 8922 
tatttttagt agtgacagcg tttcaccaca ttggtcaggc tggntctcaa tctcttgacc 8982 

5 

tggtgatctg cccacctggg cctcccaaaa tgctgggatt acaggcatga gccaccacac 9042 
ctggcaaggg tacctgacat tctaagatat caagacactt aatatgtggg ctattagctg 9102 
10 cttatttaaa tgttgaccaa attgtctgat atatctgatt aatcatgatt tcacttcatt 9162 
tcggaagaaa aattatccat atcattttta aagacgcaaa tgactttgga tttttgcata 9222 
gagtacaata gacacttcaa acaatagatt ctaacattct ctgaaacact tgagatgttt 9282 
gagctaccat ttatatgggt tatttatatt tagtctaagt aacacataca tgtttaattg 9342 
attctgtttt catggataga ttcaactaag tcttccaagc aattaatttt ttgttcgtcg 9402 
20 tcgtttttyc ttcatacgtt atctagttat gcagcactgg aaacagactg aagatcataa 9462 
accagtttta tcagacctat gtgtaataag actcctgtta atacaaaaat aaaaagctaa 9522 
aagcaa 9528 

25 

<210> 2 
<211> 1212 
<212> PRT 
30 <213> Homo sapiens 

<220> 

<221> Amino acid sequence encoding Human NCAG1 protein' 
35 <400> 2 

Met Ala Leu Met Phe Thr Gly His Leu Leu Phe Leu Ala Leu Leu Met 
15 10 15 

Phe Ala Phe Ser Thr Phe Glu Glu Ser Val Ser Asn Tyr Ser Glu Trp 
40 20 25 30 

Ala Val Phe Thr Asp Asp lie Asp Gin Phe Lys Thr Gin Lys Val Gin 
35 40 45 

45 Asp Phe Arg Pro Asn Gin Lys Leu Lys Lys Ser Met Leu His Pro Ser 
50 55 60 



50 



Leu Tyr Phe Asp Ala Gly Glu lie Gin Ala Met Arg Gin Lys Ser Arg 
65 70 75 80 

Ala Ser His Leu His Leu Phe Arg Ala He Arg Ser Ala Val Thr Val 
85 90 95 



Met Leu Ser Asn Pro Thr Tyr Tyr Leu Pro Pro Pro Lys His Ala Asp 

55 100 105 110 

Phe Ala Ala Lys Trp Asn Glu He Tyr Gly Asn Asn Leu Pro Pro Leu 

115 120 125 

60 Ala Leu Tyr Cys Leu Leu Cys Pro Glu Asp Lys Val Ala Phe Glu Phe 
130 135 140 

Val Leu Glu Tyr Met Asp Arg Met Val Gly Tyr Lys Asp Trp Leu Val 



WO 02/101044 



10/22 



PCT/EP02/06316 



145 



150 



155 



160 



Glu Asn Ala Pro Gly Asp Glu Val Pro lie Gly His Ser Leu Thr Gly 
165 170 175 

5 

Phe Ala Thr Ala Phe Asp Phe Leu Tyr Asn Leu Leu Asp Asn His Arg 
180 185 190 

Arg Gin Lys Tyr Leu Glu Lys He Trp Val He Thr Glu Glu Met Tyr 
10 195 200 205 

Glu Tyr Ser Lys Val Arg Ser Trp Gly Lys Gin Leu Leu His Asn His 
210 215 220 

15 Gin Ala Thr Asn Met lie Ala Leu Leu Thr Gly Ala Leu Val Thr Gly 
225 230 235 240 



20 



Val Asp Lys Gly Ser Lys Ala Asn He Trp Lys Gin Ala Val Val Asp 
245 250 255 

Val Met Glu Lys Thr Met Phe Leu Leu Asn His He Val Asp Gly Ser 
260 265 270 



Leu Asp Glu Gly Val Ala Tyr Gly Ser Tyr Thr Ala Lys Ser Val Thr 
25 275 280 285 

Gin Tyr Val Phe Leu Ala Gin Arg His Phe Asn He Asn Asn Leu Asp 
290 295 300 

30 Asn Asn Trp Leu Lys Met His Phe Trp Phe Tyr Tyr Ala Thr Leu Leu 
305 310 315 320 



35 



Pro Gly Phe Gin Arg Thr Val Gly He Ala Asp Ser Asn Tyr Asn Trp 
325 330 335 

Phe Tyr Gly Pro Glu Ser Gin Leu Val Phe Leu Asp Lys Phe He Leu 
340 345 350 



Lys Asn Gly Ala Gly Asn Trp Leu Ala Gin Gin He Arg Lys His Arg 
40 355 360 365 

Pro Lys Asp Gly Pro Met Val Pro Ser Thr Ala Gin Arg Trp Ser Thr 
370 375 380 

45 Leu His Thr Glu Tyr He Trp Tyr Asp Pro Gin Leu Thr Pro Gin Pro 
385 390 395 400 



50 



Pro Ala Asp Tyr Gly Thr Ala Lys lie His Thr Phe Pro Asn Trp Gly 

405 410 415 

Val Val Thr Tyr Gly Ala Gly Leu Pro Asn Thr Gin Thr Asn Thr Phe 
420 425 430 



Val Ser Phe Lys Ser Gly Lys Leu Gly Gly Arg Ala Val Tyr Asp He 
55 435 440 445 

Val His Phe Gin Pro Tyr Ser Trp lie Asp Gly Trp Arg Ser Phe Asn 
450 455 460 

60 Pro Gly His Glu His Pro Asp Gin Asn Ser Phe Thr Phe Ala Pro Asn 
465 470 475 480 



Gly Gin Val Phe Val Ser Glu Ala Leu Tyr Gly Pro Lys Leu Ser His 
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485 



490 



495 



Leu Asn Asn Val Leu Val Phe Ala Pro Ser Pro Ser Ser Gin Cys Asn 
500 505 510 

5 

Lys Pro Trp Glu Gly Gin Leu Gly Glu Cys Ala Gin Trp Leu Lys Trp 
515 520 525 

Thr Gly Glu Glu Val Gly Asp Ala Ala Gly Glu lie lie Thr Ala Ser 
10 530 535 540 

Gin His Gly Glu Met Val Phe Val Ser Gly Glu Ala Val Ser Ala Tyr 
545 550 555 560 

15 Ser Ser Ala Met Arg Leu Lys Ser Val Tyr Arg Ala Leu Leu Leu Leu 

565 570 575 



20 



Asn Ser Gin Thr Leu Leu Val Val Asp His lie Glu Arg Gin Glu Asp 
580 585 590 

Ser Pro lie Asn Ser Val Ser Ala Phe Phe His Asn Leu Asp lie Asp 
595 600 605 



Phe Lys Tyr lie Pro Tyr Lys Phe Met Asn Arg Tyr Asn Gly Ala Met 

25 610 615 620 

Met Asp Val Trp Asp Ala His Tyr Lys Met Phe Trp Phe Asp His His 

625 630 635 640 

30 Gly Asn Ser Pro Met Ala Ser He Gin Glu Ala Glu Gin Ala Ala Glu 

645 650 655 



35 



Phe Lys Lys Arg Trp Thr Gin Phe Val Asn Val Thr Phe Gin Met Glu 

660 665 670 

Pro Thr He Thr Arg He Ala Tyr Val Phe Tyr Gly Pro Tyr Xle Asn 

675 680 685 



Val Ser Ser Cys Arg Phe He Asp Ser Ser Asn Pro Gly Leu Gin He 
40 690 695 700 

Ser Leu Asn Val Asn Asn Thr Glu His Val Val Ser He Val Thr Asp 
705 710 715 720 

45 Tyr His Asn Leu Lys Thr Arg Phe Asn Tyr Leu Gly Phe Gly Gly Phe 

725 730 735 



50 



Ala Ser Val Ala Asp Gin Gly Gin He Thr Arg Phe Gly Leu Gly Thr 
740 745 750 

Gin Ala He Val Lys Pro Val Arg His Asp Arg He He Phe Pro Phe 
755 760 765 



Gly Phe Lys Phe Asn He Ala Val Gly Leu He Leu Cys He Ser Leu 
55 770 775 780 

Val He Leu Thr Phe Gin Trp Arg Phe Tyr Leu Ser Phe Arg Lys Leu 
785 790 795 800 

60 Met Arg Trp He Leu He Leu Val He Ala Leu Trp Phe He Glu Leu 

805 810 815 



Leu Asp Val Trp Ser Thr Cys Ser Gin Pro He Cys Ala Lys Trp Thr 



WO 02/101044 



12/22 



PCT/EP02/06316 



820 825 830 

Arg Thr Glu Ala Glu Gly Ser Lys Lys Ser Leu Ser Ser Glu Gly His 
835 840 845 

5 

His Met Asp Leu Pro Asp Val Val lie Thr Ser Leu Pro Gly Ser Gly 
850 855 860 

Ala Glu lie Leu Lys Gin Leu Phe Phe Asn Ser Ser Asp Phe Leu Tyr 
10 865 870 875 880 

lie Arg Val Pro Thr Ala Tyr lie Asp lie Pro Glu Thr Glu Leu Glu 
885 890 895 

15 lie Asp Ser Phe Val Asp Ala Cys Glu Trp Lys Val Ser Asp lie Arg 
900 905 910 



20 



Ser Gly His Phe Arg Leu Leu Arg Gly Trp Leu Gin Ser Leu Val Gin 
915 920 925 

Asp Thr Lys Leu His Leu Gin Asn lie His Leu His Glu Pro Asn Arg 
930 935 940 



Gly Lys Leu Ala Gin Tyr Phe Ala Met Asn Lys Asp Lys Lys Arg Lys 
25 945 950 955 960 

Phe Lys Arg Arg Glu Ser Leu Pro Glu Gin Arg Ser Gin Met Lys Gly 
965 970 975 

30 Ala Phe Asp Arg Asp Ala Glu Tyr lie Arg Ala Leu Arg Arg His Leu 
980 985 990 



35 



Val Tyr Tyr Pro Ser Ala Arg Pro Val Leu Ser Leu Ser Ser Gly Ser 
995 1000 1005 

Trp Thr Leu Lys Leu His Phe Phe Gin Glu Val Leu Gly Ala Ser Met 
1010 1015 1020 



Arg Ala Leu Tyr He Val Arg Asp Pro Arg Ala Trp He Tyr Ser Met 
40 025 1030 1035 1040 

Leu Tyr Asn Ser Lys Pro Ser Leu Tyr Ser Leu Lys Asn Val Pro Glu 
1045 1050 1055 

45 His Leu Ala Lys Leu Phe Lys He Glu Gly Gly Lys Gly Lys Cys Asn 
1060 1065 1070 



50 



Leu Asn Ser Gly Tyr Ala Phe Glu Tyr Glu Pro Leu Arg Lys Glu Leu 
1075 1080 1085 

Ser Lys Ser Lys Ser Asn Ala Val Ser Leu Leu Ser His Leu Trp Leu 
1090 1095 1100 



Ala Asn Thr Ala Ala Ala Leu Arg He Asn Thr Asp Leu Leu Pro Thr 
55 105 1110 1115 1120 

Ser Tyr Gin Leu Val Lys Phe Glu Asp He Val His Phe Pro Gin Lys 
1125 1130 1135 

60 Thr Thr Glu Arg He Phe Ala Phe Leu Gly He Pro Leu Ser Pro Ala 
1140 1145 1150 

Ser Leu Asn Gin He Leu Phe Ala Thr Ser Thr Asn Leu Phe Tyr Leu 
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1155 1160 

Pro Tyr Glu Gly Glu lie Ser Pro 
1170 1175 

Asn Leu Pro Arg Asp Glu lie Lys 
185 1190 

Leu Met Asp Arg Leu Gly Tyr Pro 
1205 



1165 

Thr Asn Thr Asn Val Trp Lys Gin 
1180 

Leu lie Glu Asn lie Cys Trp Thr 
1195 1200 

Lys Phe Met Asp 
1210 



<210> 3 
15 <211> 5092 
<212> DNA 
<213> Mus sp. 

<220> 

20 <221> CDS encoding the mouse NCAGl protein 
<222> (501) . . (4121) 



25 



30 



35 



40 



55 



<400> 3 



tctgagaatg 


acagtacttt 


atcatcttct 


tttggggaac 


atacagaaac 


ataccattta 


60 


tgtgtggtaa 


gttaatcact 


acagatggtt 


tcttgtgcta 


cgtggtcaaa 


tggcttcatt 


120 


tgaattttgg 


aattttaaaa 


aattttttct 


ttttcacatg 


ttaattagat 


ttacacacag 


180 


ggagtaaatg 


ttggatttgt 


tgtattttct 


gactagacca 


ctgttttctg 


tgcattggag 


240 


acattggagg 


cattaatatt 


ccttgaaatt 


ttattttatt 


ggaagcaaac 


ctgtgccagg 


300 


gacacagaca 


tgctatataa 


tttcctaact 


tttcttgett 


tgaataagct 


gaatgtcacc 


360 


tggatttcac 


agcctatgag 


gtatagtctg 


ttttttgttt 


ttgttttttt 


gctacatctt 


420 


taatatataa 


tttacaataa 


ccagatggga 


aacactgtgc 


ttaacacata 


tgcctaagga 


480 


aaagatcttc 


cccatggatc 


atg gcg ttt atg ttt aca gaa cat tta eta ttt 
Met Ala Phe Met Phe Thr Glu His Leu Leu Phe 


533 



10 



tta aca ttg atg atg tgt agt ttt tct act tgt gaa gaa tct gtg age 581 
45 Leu Thr Leu Met Met Cys Ser Phe Ser Thr Cys Glu Glu Ser Val Ser 
15 20 25 

aat tat tct gaa tgg gca gtt ttc aca gac gat ata caa tgg ctt aag 629 
Asn Tyr Ser Glu Trp Ala Val Phe Thr Asp Asp lie Gin Trp Leu Lys 
50 30 35 40 

tea cag aaa ata caa gat ttc aaa etc aac cga aga ctt cat cca aat 677 
Ser Gin Lys lie Gin Asp Phe Lys Leu Asn Arg Arg Leu His Pro Asn 
45 50 55 



tta tat ttt gat get gga gat ata caa aca ttg aaa caa aag tct cgt 725 
Leu Tyr Phe Asp Ala Gly Asp He Gin Thr Leu Lys Gin Lys Ser Arg 
60 65 70 75 



60 aca age cat ttg cat att ttt aga get ate aaa agt gca gtg aca att 773 
Thr Ser His Leu His He Phe Arg Ala He Lys Ser Ala Val Thr He 
80 85 90 
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atg ctg tec aat cca tea tac tac eta cct cca ccc aag cat get gag 821 

Met Leu Ser Asn Pro Ser Tyr Tyr Leu Pro Pro Pro Lys His Ala Glu 

95 100 105 

5 ttt get gee aag tgg aat gaa att tat ggt aat aat ctt cct cct tta 869 

Phe Ala Ala Lys Trp Asn Glu lie Tyr Gly Asn Asn Leu Pro Pro Leu 

110 115 120 

gca ttg tat tgt tta tta tgc cca gaa gac aag gtt gee ttt gaa ttt 917 

10 Ala Leu Tyr Cys Leu Leu Cys Pro Glu Asp Lys Val Ala Phe Glu Phe 

125 130 135 

gtt atg gaa tac atg gat egg atg gtt age tac aaa gac tgg eta gtt 965 

Val Met Glu Tyr Met Asp Arg Met Val Ser Tyr Lys Asp Trp Leu Val 
15 140 145 150 155 

gag aat gca cca ggg gat gag gtt cca gtt ggc cat tct tta aca ggt 1013 

Glu Asn Ala Pro Gly Asp Glu Val Pro Val Gly His Ser Leu Thr Gly 

160 165 170 



20 



40 



60 



ttt gee act gec ttt gac ttt tta tat aat eta tta ggt aat cag cgt 1061 
Phe Ala Thr Ala Phe Asp Phe Leu Tyr Asn Leu Leu Gly Asn Gin Arg 
175 180 185 



25 aaa caa aaa tac eta gaa aaa att tgg att gtt act gag gaa atg tat 1109 
Lys Gin Lys Tyr Leu Glu Lys lie Trp lie Val Thr Glu Glu Met Tyr 
190 195 200 

gaa tat tec aag att cga tea tgg ggc aaa caa ctt ctt cat aac cat 1157 
30 Glu Tyr Ser Lys lie Arg Ser Trp Gly Lys Gin Leu Leu His Asn His 
205 210 215 

caa get aca aat atg ata get tta etc ata ggg gec ttg gtt act gga 1205 
Gin Ala Thr Asn Met He Ala Leu Leu He Gly Ala Leu Val Thr Gly 
35 220 225 230 235 

gta gat aaa gga tct aaa gca aac ata tgg aaa caa gtt gtt gtt gat 1253 
Val Asp Lys Gly Ser Lys Ala Asn He Trp Lys Gin Val Val Val Asp 
240 245 250 



gtg atg gaa aag act atg ttt etc ttg aag cat att gta gat ggc tea 1301 
Val Met Glu Lys Thr Met Phe Leu Leu Lys His He Val Asp Gly Ser 
255 260 265 



45 ttg gat gaa ggt gtg gee tat gga age tat ace tea aaa tea gtt aca 1349 
Leu Asp Glu Gly Val Ala Tyr Gly Ser Tyr Thr Ser Lys Ser Val Thr 
270 275 280 

cag tat gtt ttt ttg gca caa cgc cat ttt aac ate aac aac ttt gat 1397 
50 Gin Tyr Val Phe Leu Ala Gin Arg His Phe Asn He Asn Asn Phe Asp 
285 290 295 

aat aac tgg eta aaa atg cat ttt tgg ttt tat tat get aca ctt ttg 1445 
Asn Asn Trp Leu Lys Met His Phe Trp Phe Tyr Tyr Ala Thr Leu Leu 
55 300 305 310 315 

cca ggc tat caa aga act gta ggc ata gca gat tec aat tat aat tgg 1493 
Pro Gly Tyr Gin Arg Thr Val Gly He Ala Asp Ser Asn Tyr Asn Trp 
320 325 330 



ttt tat ggt cca gag age cag eta gtt ttc ttg gat aag ttc att tta 1541 
Phe Tyr Gly Pro Glu Ser Gin Leu Val Phe Leu Asp Lys Phe He Leu 
335 340 345 
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cag aat gga get gga aat tgg tta get cag caa att aga aag cat cga 1589 

Gin Asn Gly Ala Gly Asn Trp Leu Ala Gin Gin lie Arg Lys His Arg 
350 355 360 

5 

cct aag gat gga cca atg gtt cct tec act get cag egg tgg agt act 1637 

Pro Lys Asp Gly Pro Met Val Pro Ser Thr Ala Gin Arg Trp Ser Thr 
365 370 375 

10 ctt cat act gaa tac ate tgg tat gat cca aca etc acc cca cag cct 1685 

Leu His Thr Glu Tyr lie Trp Tyr Asp Pro Thr Leu Thr Pro Gin Pro 
380 385 390 395 

cct gtt gat ttt ggc act gca aaa atg cac aca ttt cct aac tgg ggt 1733 

15 Pro Val Asp Phe Gly Thr Ala Lys Met His Thr Phe Pro Asn Trp Gly 

400 405 410 

gtc gtg act tat ggg ggt ggg ctg cca aac acc cag acc aat acc ttt 1781 

Val Val Thr Tyr Gly Gly Gly Leu Pro Asn Thr Gin Thr Asn Thr Phe 
20 415 420 425 

gtg tct ttt aaa tct ggg aaa ctg gga gga cga get gtg tat gac ata 1829 
Val Ser Phe Lys Ser Gly Lys Leu Gly Gly Arg Ala Val Tyr Asp He 
430 435 440 



25 



45 



gtt cac ttt cag cca tat tec tgg att gat gga tgg aga age ttt aac 1877 
Val His Phe Gin Pro Tyr Ser Trp He Asp Gly Trp Arg Ser Phe Asn 
445 450 455 



30 cca gga cat gaa cat cca gat caa aat tea ttt act ttc get cct aat 1925 

Pro Gly His Glu His Pro Asp Gin Asn Ser Phe Thr Phe Ala Pro Asn 
460 465 470 475 

ggg cag gta ttc gtt tct gag get ctt tat gga cca aaa ttg age cac 1973 

35 Gly Gin Val Phe Val Ser Glu Ala Leu Tyr Gly Pro Lys Leu Ser His 

480 485 490 

ctt aac aac gta ttg gtg ttt gec cca tea cca tea agt caa tgt aat 2021 

Leu Asn Asn Val Leu Val Phe Ala Pro Ser Pro Ser Ser Gin Cys Asn 
40 495 500 505 

cag ccc tgg gaa ggt caa ctg gga gaa tgt gca cag tgg etc aag tgg 2069 

Gin Pro Trp Glu Gly Gin Leu Gly Glu Cys Ala Gin Trp Leu Lys Trp 

510 515 520 



act ggg gaa gag gtt ggt gat gca get ggg gaa gtt att act get get 2117 
Thr Gly Glu Glu Val Gly Asp Ala Ala Gly Glu Val He Thr Ala Ala 
525 530 535 



50 caa cat ggt gat agg atg ttt gtg agt ggg gaa gca gtg tct get tat 2165 
Gin His Gly Asp Arg Met Phe Val Ser Gly Glu Ala Val Ser Ala Tyr 
540 545 550 555 

tct tct gee atg aga ctg aaa agt gtc tat cgt get tta ctt ctt tta 2213 
55 Ser Ser Ala Met Arg Leu Lys Ser Val Tyr Arg Ala Leu Leu Leu Leu 

560 565 570 

aat tea caa act ctg ctt gtt gtc gat cat att gaa agg caa gaa act 2261 
Asn Ser Gin Thr Leu Leu Val Val Asp His He Glu Arg Gin Glu Thr 
60 575 580 585 

tec cca ata aat tct gtc agt gec ttc ttt cat aat ttg gat att gat 2309 
Ser Pro He Asn Ser Val Ser Ala Phe Phe His Asn Leu Asp He Asp 
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590 595 600 

ttt aaa tac ate cca tac aag ttt atg aat aga tat aat ggt gec atg 2357 

Phe Lys Tyr He Pro Tyr Lys Phe Met Asn Arg Tyr Asn Gly Ala Met 
5 605 610 615 

atg gat gtg tgg gat gca cac tat aaa atg ttt tgg ttt gat cac cat 2405 

Met Asp Val Trp Asp Ala His Tyr Lys Met Phe Trp Phe Asp His His 

620 625 630 635 



10 



30 



ggc aac agt cct gtg get aat ata cag gaa gca gaa cag get get gaa 2453 
Gly Asn Ser Pro Val Ala Asn He Gin Glu Ala Glu Gin Ala Ala Glu 
640 645 650 



15 ttt aag aaa egg tgg aca cag ttt gtt aat gtt aca ttt cat atg gaa 2501 
Phe Lys Lys Arg Trp Thr Gin Phe Val Asn Val Thr Phe His Met Glu 
655 660 665 

tec aca ate aca aga att get tat gta ttt tat ggg cca tat gtc aat 2549 
20 Ser Thr He Thr Arg He Ala Tyr Val Phe Tyr Gly Pro Tyr Val Asn 
670 675 680 

gtt tec age tgc aga ttt att gat agt tec agt tct gga ctt cag att 2597 
Val Ser Ser Cys Arg Phe He Asp Ser Ser Ser Ser Gly Leu Gin He 
25 685 690 695 

tct tta cat gtc aac agt act gaa cat agt gtg tct gtt gta act gac 2645 
Ser Leu His Val Asn Ser Thr Glu His Ser Val Ser Val Val Thr Asp 
700 705 710 715 



tat caa aac ctt aaa age aga ttc agt tac ctg gga ttt ggt ggt ttt 2693 
Tyr Gin Asn Leu Lys Ser Arg Phe Ser Tyr Leu Gly Phe Gly Gly Phe 
720 725 730 



35 gee agt gtg get aat caa gga cag ata ace aga ttt ggt ttg ggt act 2741 
Ala Ser Val Ala Asn Gin Gly Gin He Thr Arg Phe Gly Leu Gly Thr 
735 740 745 

caa gaa ata gta aac cct gta aga cat gat aaa gtt aat ttc ccc ttt 2789 
40 Gin Glu lie Val Asn Pro Val Arg His Asp Lys Val Asn Phe Pro Phe 
750 755 760 

ggg ttt aaa ttt aat ata gca gtt gga ttc att ttg tgt att agt ttg 2837 
Gly Phe Lys Phe Asn He Ala Val Gly Phe He Leu Cys He Ser Leu 
45 765 770 775 

gtt att tta act ttt caa tgg egg ttt tac ctt tec ttt aga aag eta 2885 
Val He Leu Thr Phe Gin Trp Arg Phe Tyr Leu Ser Phe Arg Lys Leu 
780 785 790 795 

50 

atg cgc tgt gta tta ata ctt gtt att gee ttg tgg ttt att gag ctt 2933 
Met Arg Cys Val Leu He Leu Val He Ala Leu Trp Phe He Glu Leu 
800 805 810 

55 ctg gat gta tgg agt aca tgc act cag ccc ate tgt gca aaa tgg aca 2981 
Leu Asp Val Trp Ser Thr Cys Thr Gin Pro He Cys Ala Lys Trp Thr 
815 820 825 

agg act gaa get aag gca aat gag aag gtc atg att tct gaa ggg cat 3029 
60 Arg Thr Glu Ala Lys Ala Asn Glu Lys Val Met He Ser Glu Gly His 
830 835 840 

cat gtg gat ctt cct aat gtt att att acc tea etc cct ggt tea gga 3077 
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10 



15 



20 



His Val Asp Leu Pro Asn Val He He Thr Ser Leu Pro Gly Ser Gly 
845 850 855 

get gaa att etc aaa cag ctt ttt ttc aac age agt gat ttt etc tac 3125 
Ala Glu He Leu Lys Gin Leu Phe Phe Asn Ser Ser Asp Phe Leu Tyr 
860 865 870 875 

ate aga att cct aca gec tac atg gat ate cct gaa act gaa ttt gaa 3173 
He Arg He Pro Thr Ala Tyr Met Asp He Pro Glu Thr Glu Phe Glu 
880 885 890 

att gac tea ttt gta gat get tgt gag tgg aaa gta tea gat ate cgc 3221 
He Asp Ser Phe Val Asp Ala Cys Glu Trp Lys Val Ser Asp lie Arg 
895 900 905 

agt ggg cac ttt cat ctt ctt cga ggg tgg ctg cag tct ttg gtc cag 3269 
Ser Gly His Phe His Leu Leu Arg Gly Trp Leu Gin Ser Leu Val Gin 
910 915 920 

gat aca aaa ctt cac ttg caa aac ate cat eta cat gaa acc agt agg 3317 
Asp Thr Lys Leu His Leu Gin Asn He His Leu His Glu Thr Ser Arg 
925 930 935 



agt aaa ctg gee caa tat ttt aca act aat aag gac aaa aag cga aaa 3365 
25 Ser Lys Leu Ala Gin Tyr Phe Thr Thr Asn Lys Asp Lys Lys Arg Lys 
940 945 950 955 

tta aaa aga agg gag tct ttg caa gat caa aga agt aga ata aaa gga 3413 
Leu Lys Arg Arg Glu Ser Leu Gin Asp Gin Arg Ser Arg He Lys Gly 
30 960 965 970 



35 



40 



cca ttt gat aga gat get gaa tat att agg get tta aga aga cac ctt 3461 
Pro Phe Asp Arg Asp Ala Glu Tyr He Arg Ala Leu Arg Arg His Leu 
975 980 985 

gtt tat tac cca agt gca cgt cct gtg etc age tta agt agt ggt age 3509 
Val Tyr Tyr Pro Ser Ala Arg Pro Val Leu Ser Leu Ser Ser Gly Ser 
990 995 1000 

tgg aca ttg aag ctt cat ttt ttt cag gaa gtt tta gga act tea atg 3557 
Trp Thr Leu Lys Leu His Phe Phe Gin Glu Val Leu Gly Thr Ser Met 
1005 1010 1015 



egg gca ttg tac ata gta aga gac cct cga get tgg ate tat tea gtg 
45 Arg Ala Leu Tyr He Val Arg Asp Pro Arg Ala Trp He Tyr Ser Val 
1020 1025 1030 1035 



3605 



50 



eta tat ggt agt aaa cca agt ctt tat tct ttg aag aat gta cca gag 
Leu Tyr Gly Ser Lys Pro Ser Leu Tyr Ser Leu Lys Asn Val Pro Glu 
1040 1045 1050 



3653 



55 



60 



cac tta gca aaa ttg ttt aaa ata gag gaa ggt aaa age aaa tgt aat 3701 

His Leu Ala Lys Leu Phe Lys He Glu Glu Gly Lys Ser Lys Cys Asn 
1055 1060 1065 

teg aat tct ggc tat get ttt gag tat gaa tea ctg aag aaa gaa tta 3749 

Ser Asn Ser Gly Tyr Ala Phe Glu Tyr Glu Ser Leu Lys Lys Glu Leu 
1070 1075 1080 

gaa ata tec caa tea aat get ate tec tta tta tct cat ttg tgg gta 3797 

Glu He Ser Gin Ser Asn Ala He Ser Leu Leu Ser His Leu Trp Val 
1085 1090 1095 
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gca aac act gca gca gcc ttg aga ata aat aca gat ttg ctg cct acc 3845 
Ala Asn Thr Ala Ala Ala Leu Arg lie Asn Thr Asp Leu Leu Pro Thr 
1100 1105 1110 1115 

5 aat tac cat ctg gtc aag ttt gaa gat att gtt cat ttt cct cag aag 3893 
Asn Tyr His Leu Val Lys Phe Glu Asp He Val His Phe Pro Gin Lys 
1120 1125 1130 

act act gaa agg att ttt get ttc ctt ggc att cct ttg tct cct get 3941 
10 Thr Thr Glu Arg He Phe Ala Phe Leu Gly He Pro Leu Ser Pro Ala 
1135 1140 1145 

agt tta aac caa atg eta ttt gcc act tec aca aac ctt ttt tat ctt 3989 
Ser Leu Asn Gin Met Leu Phe Ala Thr Ser Thr Asn Leu Phe Tyr Leu 
15 1150 1155 1160 

cca tat gag ggg gaa ata tea cca tct aat act aat att tgg aaa aca 4037 
Pro Tyr Glu Gly Glu He Ser Pro Ser Asn Thr Asn He Trp Lys Thr 
1165 1170 1175 

20 

aac ttg cct aga gat gaa att aaa eta att gaa aac att tgc tgg aca 4085 
Asn Leu Pro Arg Asp Glu He Lys Leu He Glu Asn He Cys Trp Thr 
1180 1185 1190 1195 

25 ctg atg gat cat eta gga tat cca aag ttt atg gac taaatgctgc 4131 
Leu Met Asp His Leu Gly Tyr Pro Lys Phe Met Asp 
1200 1205 

aggteggcaa aatttgeact aatgtgtccc aacctacttt gtggatatga actagaaaac 4191 

30 

tttgtttatt cttgtacatg tatgtatgtg tgtagagtga gtgcgtgtgt ccagtatgtt 4251 
atttgeacag agatattttc aaaataggca ccatatttgg cctagcagga tttattttta 4311 
35 tgttaccact tttcttgect ttgtttctga atttttttct gctaaaatgt ttctgetaca 4371 
gaggtatata ttctggggtt ctgaaatatg gggttttaat ggactttaac tcaacttctt 4431 
tggaaactat ttatctatct taggacctca aacactacaa acggccttgc aattgetget 4491 

40 

gtatctagtc atctctcgct cttaatatgg actacaaaac tttatgtttt gaaaaegtet 4551 
aacatttacc ttgcacacaa aaacgagaaa taaaaaaaca aaaattattt tacgttgtat 4611 
45 agtgtttatt gaaatcactt ggtgaggctg gggggaggag cttatgataa agttccctta 4671 
agaaactaga aaataaagat gaaaacatag aattaaggtt tttttgtttc tttcttcctt 4731 
tttttttttt ttttgtacta agaaataaga ttgaacagtg gatactgaaa tttggtgaat 4791 

50 

tattttggaa gtgattctct catttgtctt tctgaagcta cagctgttca tcatcacact 4851 
acccttaccc tgtctatcca ttctgtcatt gtcaccaaaa aaaaaaagtc agtaattact 4911 
55 agctacaaaa ctatctaaca agcccttctc tggatgattt actttgtgtt aaagacttac 4971 
acagatttat aatcacattt agttgtgtgg cattaccaca atatgactca aagcaaaagc 5031 
agacttctgt ctgttgtagt gtttttaagt gtgtgttgtg gggtggggga gggsrsdbac 5091 




5092 



WO 02/101044 



19/22 



PCT7EP02/06316 



<210> 4 
<211> 1207 
<212> PRT 
<213> Mus sp. 

5 

<220> 

<221> Amino acid sequence encoding Mouse NCAG1 protein 
<400> 4 

10 Met Ala Phe Met Phe Thr Glu His Leu Leu Phe Leu Thr Leu Met Met 
15 10 15 

Cys Ser Phe Ser Thr Cys Glu Glu Ser Val Ser Asn Tyr Ser Glu Trp 
20 25 30 



15 



Ala Val Phe Thr Asp Asp lie Gin Trp Leu Lys Ser Gin Lys lie Gin 
35 40 45 



Asp Phe Lys Leu Asn Arg Arg Leu His Pro Asn Leu Tyr Phe Asp Ala 

20 50 55 60 

Gly Asp He Gin Thr Leu Lys Gin Lys Ser Arg Thr Ser His Leu His 

65 70 75 80 

25 He Phe Arg Ala He Lys Ser Ala Val Thr He Met Leu Ser Asn Pro 

85 90 95 



30 



Ser Tyr Tyr Leu Pro Pro Pro Lys His Ala Glu Phe Ala Ala Lys Trp 
100 105 HO 

Asn Glu He Tyr Gly Asn Asn Leu Pro Pro Leu Ala Leu Tyr Cys Leu 
115 120 125 



Leu Cys Pro Glu Asp Lys Val Ala Phe Glu Phe Val Met Glu Tyr Met 
35 130 135 140 

Asp Arg Met Val Ser Tyr Lys Asp Trp Leu Val Glu Asn Ala Pro Gly 
145 150 155 160 

40 Asp Glu Val Pro Val Gly His Ser Leu Thr Gly Phe Ala Thr Ala Phe 

165 170 175 

Asp Phe Leu Tyr Asn Leu Leu Gly Asn Gin Arg Lys Gin Lys Tyr Leu 
180 185 190 

45 

Glu Lys He Trp He Val Thr Glu Glu Met Tyr Glu Tyr Ser Lys He 
195 200 205 

Arg Ser Trp Gly Lys Gin Leu Leu His Asn His Gin Ala Thr Asn Met 
50 210 215 220 

He Ala Leu Leu He Gly Ala Leu Val Thr Gly Val Asp Lys Gly Ser 
225 230 235 240 

55 Lys Ala Asn He Trp Lys Gin Val Val Val Asp Val Met Glu Lys Thr 

245 250 255 

Met Phe Leu Leu Lys His He Val Asp Gly Ser Leu Asp Glu Gly Val 
260 265 270 

60 

Ala Tyr Gly Ser Tyr Thr Ser Lys Ser Val Thr Gin Tyr Val Phe Leu 
275 280 285 
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Ala Gin Arg His Phe Asn He Asn Asn Phe Asp Asn Asn Trp Leu Lys 
290 295 300 

Met His Phe Trp Phe Tyr Tyr Ala Thr Leu Leu Pro Gly Tyr Gin Arg 
5 305 310 315 320 

Thr Val Gly He Ala Asp Ser Asn Tyr Asn Trp Phe Tyr Gly Pro Glu 
325 330 335 

10 Ser Gin Leu Val Phe Leu Asp Lys Phe He Leu Gin Asn Gly Ala Gly 
340 345 350 

Asn Trp Leu Ala Gin Gin He Arg Lys His Arg Pro Lys Asp Gly Pro 
355 360 365 

15 

Met Val Pro Ser Thr Ala Gin Arg Trp Ser Thr Leu His Thr Glu Tyr 
370 375 380 

He Trp Tyr Asp Pro Thr Leu Thr Pro Gin Pro Pro Val Asp Phe Gly 
20 385 390 395 400 

Thr Ala Lys Met His Thr Phe Pro Asn Trp Gly Val Val Thr Tyr Gly 
405 410 415 

25 Gly Gly Leu Pro Asn Thr Gin Thr Asn Thr Phe Val Ser Phe Lys Ser 
420 425 430 

Gly Lys Leu Gly Gly Arg Ala Val Tyr Asp He Val His Phe Gin Pro 
435 440 445 

30 

Tyr Ser Trp He Asp Gly Trp Arg Ser Phe Asn Pro Gly His Glu His 
450 455 460 

Pro Asp Gin Asn Ser Phe Thr Phe Ala Pro Asn Gly Gin Val Phe Val 
35 465 470 475 480 

Ser Glu Ala Leu Tyr Gly Pro Lys Leu Ser His Leu Asn Asn Val Leu 
485 490 495 

40 Val Phe Ala Pro Ser Pro Ser Ser Gin Cys Asn Gin Pro Trp Glu Gly 
500 505 510 



45 



Gin Leu Gly Glu Cys Ala Gin Trp Leu Lys Trp Thr Gly Glu Glu Val 

515 520 525 

Gly Asp Ala Ala Gly Glu Val He Thr Ala Ala Gin His Gly Asp Arg 
530 535 540 



Met Phe Val Ser Gly Glu Ala Val Ser Ala Tyr Ser Ser Ala Met Arg 
50 545 550 555 560 

Leu Lys Ser Val Tyr Arg Ala Leu Leu Leu Leu Asn Ser Gin Thr Leu 
565 570 575 

55 Leu Val Val Asp His He Glu Arg Gin Glu Thr Ser Pro He Asn Ser 
580 585 590 



60 



Val Ser Ala Phe Phe His Asn Leu Asp He Asp Phe Lys Tyr He Pro 
595 600 605 

Tyr Lys Phe Met Asn Arg Tyr Asn Gly Ala Met Met Asp Val Trp Asp 
610 615 620 
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Ala His Tyr Lys Met Phe Trp Phe Asp His His Gly Asn Ser Pro Val 
625 630 635 640 

Ala Asn lie Gin Glu Ala Glu Gin Ala Ala Glu Phe Lys Lys Arg Trp 
5 645 650 655 

Thr Gin Phe Val Asn Val Thr Phe His Met Glu Ser Thr lie Thr Arg 
660 665 670 

10 He Ala Tyr Val Phe Tyr Gly Pro Tyr Val Asn Val Ser Ser Cys Arg 
675 680 685 



15 



Phe He Asp Ser Ser Ser Ser Gly Leu Gin He Ser Leu His Val Asn 

690 695 700 

Ser Thr Glu His Ser Val Ser Val Val Thr Asp Tyr Gin Asn Leu Lys 

705 710 715 720 



Ser Arg Phe Ser Tyr Leu Gly Phe Gly Gly Phe Ala Ser Val Ala Asn 
20 725 730 735 

Gin Gly Gin He Thr Arg Phe Gly Leu Gly Thr Gin Glu He Val Asn 
740 745 750 

25 Pro Val Arg His Asp Lys Val Asn Phe Pro Phe Gly Phe Lys Phe Asn 
755 760 765 



30 



He Ala Val Gly Phe He Leu Cys He Ser Leu Val He Leu Thr Phe 
770 775 780 

Gin Trp Arg Phe Tyr Leu Ser Phe Arg Lys Leu Met Arg Cys Val Leu 
785 790 795 800 



He Leu Val He Ala Leu Trp Phe He Glu Leu Leu Asp Val Trp Ser 
35 805 810 815 

Thr Cys Thr Gin Pro He Cys Ala Lys Trp Thr Arg Thr Glu Ala Lys 
820 825 830 

40 Ala Asn Glu Lys Val Met He Ser Glu Gly His His Val Asp Leu Pro 
835 840 845 



45 



Asn Val He He Thr Ser Leu Pro Gly Ser Gly Ala Glu He Leu Lys 
850 855 860 

Gin Leu Phe Phe Asn Ser Ser Asp Phe Leu Tyr He Arg He Pro Thr 

865 870 875 880 



Ala Tyr Met Asp He Pro Glu Thr Glu Phe Glu He Asp Ser Phe Val 
50 885 890 895 

Asp Ala Cys Glu Trp Lys Val Ser Asp He Arg Ser Gly His Phe His 
900 905 910 

55 Leu Leu Arg Gly Trp Leu Gin Ser Leu Val Gin Asp Thr Lys Leu His 
915 920 925 



60 



Leu Gin Asn He His Leu His Glu Thr Ser Arg Ser Lys Leu Ala Gin 
930 935 940 

Tyr Phe Thr Thr Asn Lys Asp Lys Lys Arg Lys Leu Lys Arg Arg Glu 
945 950 955 960 
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Ser Leu Gin Asp Gin Arg Ser Arg lie Lys Gly Pro Phe Asp Arg Asp 
965 970 975 

Ala Glu Tyr lie Arg Ala Leu Arg Arg His Leu Val Tyr Tyr Pro Ser 
5 980 985 990 

Ala Arg Pro Val Leu Ser Leu Ser Ser Gly Ser Trp Thr Leu Lys Leu 
995 1000 1005 

10 His Phe Phe Gin Glu Val Leu Gly Thr Ser Met Arg Ala Leu Tyr lie 
1010 1015 1020 

Val Arg Asp Pro Arg Ala Trp lie Tyr Ser Val Leu Tyr Gly Ser Lys 
025 1030 1035 1040 

15 

Pro Ser Leu Tyr Ser Leu Lys Asn Val Pro Glu His Leu Ala Lys Leu 
1045 1050 1055 

Phe Lys He Glu Glu Gly Lys Ser Lys Cys Asn Ser Asn Ser Gly Tyr 
20 1060 1065 1070 

Ala Phe Glu Tyr Glu Ser Leu Lys Lys Glu Leu Glu He Ser Gin Ser 
1075 1080 1085 

25 Asn Ala He Ser Leu Leu Ser His Leu Trp Val Ala Asn Thr Ala Ala 
1090 1095 1100 

Ala Leu Arg He Asn Thr Asp Leu Leu Pro Thr Asn Tyr His Leu Val 
105 1110 1115 1120 

30 

Lys Phe Glu Asp He Val His Phe Pro Gin Lys Thr Thr Glu Arg He 
1125 1130 1135 

Phe Ala Phe Leu Gly He Pro Leu Ser Pro Ala Ser Leu Asn Gin Met 
35 1140 1145 1150 

Leu Phe Ala Thr Ser Thr Asn Leu Phe Tyr Leu Pro Tyr Glu Gly Glu 
1155 1160 1165 

40 He Ser Pro Ser Asn Thr Asn He Trp Lys Thr Asn Leu Pro Arg Asp 
1170 1175 1180 

Glu He Lys Leu He Glu Asn He Cys Trp Thr Leu Met Asp His Leu 
185 1190 1195 1200 

45 

Gly Tyr Pro Lys Phe Met Asp 
1205 
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